Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further test improvements #9

Merged
merged 18 commits into from
Feb 21, 2024
Merged

Further test improvements #9

merged 18 commits into from
Feb 21, 2024

Conversation

Fiwo735
Copy link
Collaborator

@Fiwo735 Fiwo735 commented Feb 6, 2024

  • Improved Python testing, making it the default and recommended testing option.
  • Removed duplicated test case (same functionality as _example)
  • Updated AST diagram to the new version

@Fiwo735
Copy link
Collaborator Author

Fiwo735 commented Feb 6, 2024

@Jpnock please let me know if you have any further suggestions regarding testing.
I'd also appreciate if you took a look into the Github actions as I've updated that to use the new scripts/test.py.

@Fiwo735 Fiwo735 requested a review from Jpnock February 6, 2024 23:19
@Fiwo735 Fiwo735 self-assigned this Feb 6, 2024
@Jpnock
Copy link
Collaborator

Jpnock commented Feb 7, 2024

Many thanks for updating the diagram : )

I've summarised the differences between the Python and bash script below which should probably be addressed. As for the GitHub Actions issue, this is likely due to there being no terminal TTY available so these code paths need disabling in CI.

Python test script differences

  • If make fails, the script continues to execute (there's no exit code check under both the normal and coverage paths)
  • There's no exit code check on make clean
  • The make clean stage doesn't use the "-C", PROJECT_LOCATION argument like all of the other make commands
  • ASAN_OPTIONS=exitcode=0 environment variable needs to be set when executing the student's compiler binary to prevent ASAN from forcing the test to fail if there's a memory leak etc.
  • While --silent is a nice addition, there's no output from make until it completes, so it looks like the script is stuck for about 10 seconds when you run it (I suggest a one line print to say that we're building the compiler)
  • The verbose output should be updated to match the output of the bash testing script to improve readability (see test_sh.log vs test_py_verbose.log)
  • There's no timeout on the C90 compilation stage to prevent one bad test hanging the rest (previously 15s I believe)
  • Subject to opinion but I think the verbose output should probably be the default, with a --short option to just show the progress bar (since the full output introduces students to where the appropriate log/assembly files are for each test without them having to add a flag to figure this out)

GitHub Actions issue

I assume a TTY doesn't exist in the actions runtime, so the following line likely returns a non zero exit code.

Regardless, the progress bar code should be disabled in GitHub Actions otherwise you get output that looks like test_py.log. I think we'd also want the full verbose output in the actions logs too.

    _, max_line_length = os.popen("stty size", "r").read().split()
ValueError: not enough values to unpack (expected 2, got 0)

Other notes

Have we got a plan for how we're going to update the marking code to fit in with this python script? My concern is that the marking script currently extends the existing test.sh bash script -- so if we start running this script in CI, we now have two scripts with potentially differing results for the number of seen tests passing.

At 400 LOC the Python script is much more daunting from the students' perspective to read and understand, so I suggest that we at least move the ProgressBar class to a different file.

test_sh.log

scripts/test.sh > test_sh.log

compiler_tests/_example/example.c
	> Pass

compiler_tests/array/declare_global.c
	> Failed to compile testcase: 
	 ./bin/output/array/declare_global/declare_global.compiler.stderr.log 
	 ./bin/output/array/declare_global/declare_global.compiler.stdout.log 
	 ./bin/output/array/declare_global/declare_global.s 
	 ./bin/output/array/declare_global/declare_global.s.printed

compiler_tests/array/declare_local.c
	> Failed to compile testcase: 
	 ./bin/output/array/declare_local/declare_local.compiler.stderr.log 
	 ./bin/output/array/declare_local/declare_local.compiler.stdout.log 
	 ./bin/output/array/declare_local/declare_local.s 
	 ./bin/output/array/declare_local/declare_local.s.printed

test_py.log

scripts/test.py > test_py.log

Running Tests [                                                                ]
Pass: 0 | Fail: 0 | Remaining: 87
See logs for more details (use -v for verbose output).
�[3A
Running Tests [                                                                ]
Pass:  0 | Fail:  0 | Remaining: 87 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m                                                               ]
Pass:  1 | Fail:  0 | Remaining: 86 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m                                                              ]
Pass:  1 | Fail:  1 | Remaining: 85 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m                                                              ]
Pass:  1 | Fail:  2 | Remaining: 84 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m�[91m#�[0m                                                             ]
Pass:  1 | Fail:  3 | Remaining: 83 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m�[91m#�[0m�[91m#�[0m                                                            ]
Pass:  1 | Fail:  4 | Remaining: 82 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m�[91m#�[0m�[91m#�[0m�[91m#�[0m                                                           ]
Pass:  1 | Fail:  5 | Remaining: 81 
See logs for more details (use -v for verbose output).
�[3A

test_py_verbose.log

scripts/test.py -v > test_py_verbose.log`

Running Tests [                                                                ]
Pass: 0 | Fail: 0 | Remaining: 86
See logs for more details (use -v for verbose output).
�[3A
Running Tests [                                                                ]
�[92mPass:  0�[0m | �[91mFail:  0�[0m | Remaining: 86 
See logs for more details (use -v for verbose output).
/workspaces/langproc-2022-cw/compiler_tests/_example/example.c
	> Pass
/workspaces/langproc-2022-cw/compiler_tests/array/declare_global.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/declare_global/declare_global.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/declare_global/declare_global.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/declare_local.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/declare_local/declare_local.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/declare_local/declare_local.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/index_constant.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/index_constant/index_constant.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/index_constant/index_constant.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/index_expression.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/index_expression/index_expression.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/index_expression/index_expression.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/index_variable.c

@Fiwo735
Copy link
Collaborator Author

Fiwo735 commented Feb 7, 2024

Is there a point of doing a return code check on make clean given it's only using rm -f which cannot fail?

@Jpnock
Copy link
Collaborator

Jpnock commented Feb 7, 2024

Is there a point of doing a return code check on make clean given it's only using rm -f which cannot fail?

rm -f can fail in a few cases, notably permissions errors (e.g. if someone accidentally runs the script as sudo or root and then later runs as non root)

root@host:~# cd /tmp
root@host:/tmp# mkdir test
root@host:/tmp# cd test/
root@host:/tmp/test# touch abc

normaluser@host:/tmp$ rm -rf test/
rm: cannot remove 'test/abc': Permission denied
normaluser@host:/tmp$ echo "$?"
1

@Fiwo735
Copy link
Collaborator Author

Fiwo735 commented Feb 9, 2024

@Jpnock I believe all your suggestions have been addressed through the recent commits. Some remarks:

  • I have addressed, but haven't tested ASAN_OPTIONS and TTY ValueError
  • Splitting test.py - in my opinion, it should stay as one file as it is less intimidating that way and the program flow is not too difficult to trace for students who wish to dive deeper into Python (but noone realistically has to ever open that file for the coursework)

.github/workflows/integration.yml Outdated Show resolved Hide resolved
Dockerfile Outdated
Comment on lines 25 to 28
# Install Python packages
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir colorama

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can avoid this? I'm not the biggest fan of getting people to rebuild their container after they've already started. If needed, move to the end of the Dockerfile such that the rest of the layers will likely be cached, making rebuilding much shorter.

Additionally, we should likely be tracking any dependencies in a requirements.txt file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The reason for colorama is that it's a portable (hopefully covers all systems students might want to use) and light-weight way of printing in colour, much better than directly emitting the special exit codes (\033... etc.) and likely less confusing for those trying to understand test.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly in favor of holding off with requirements.txt until we have more than 1 dependency - while it's obviously a better solution, for now the current way (hopefully) seems more straightforward for students.

compiler_tests/default/test_RETURN.c Show resolved Hide resolved
docs/coverage.md Outdated Show resolved Hide resolved
docs/coverage.md Outdated Show resolved Hide resolved
scripts/test.py Outdated
import shutil
import subprocess
import queue
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Optional
from colorama import init, Fore
Copy link
Collaborator

@Jpnock Jpnock Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a special reason for needing this library? If not then I think removing this simplifies a bunch of things and wouldn't require everyone to rebuild their docker image or manually install the library before their test script starts working again.

Not sure what it's for, so just speculating: If this is for trying to get this to work inside Actions CI, just check on sys.stdout.isatty() instead and don't initialise the progress bar if there's no tty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[reason for colorama explained in other comment]
sys.stdout.isatty() sounds better that the current way with try... except in main() when constructing ProgressBar, but is it as safe in your opinion @Jpnock?

)
return e.returncode
except subprocess.TimeoutExpired as e:
print(f"{e.cmd} took more than {e.timeout}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail_testcase is not called here

Copy link
Collaborator Author

@Fiwo735 Fiwo735 Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you exactly mean here @Jpnock?

scripts/test.py Show resolved Hide resolved
scripts/test.py Show resolved Hide resolved
test.sh Show resolved Hide resolved
@Jpnock
Copy link
Collaborator

Jpnock commented Feb 14, 2024

@Jpnock I believe all your suggestions have been addressed through the recent commits. Some remarks:

Thanks for the changes :) I've replied with some follow up comments which are hopefully useful

Also, just a note on your previous point: but noone realistically has to ever open that file for the coursework. Last year I was asked by quite a lot of people to walk them through the testing script, so even if nobody has to open the file, lots of people definitely did try to understand it -- so we should bear this in mind. Hiding parts away like the progress bar and junit CI test file creation in separate files seems like a good idea since you don't need to understand those at all to understand how the main part of the script works.

@Fiwo735
Copy link
Collaborator Author

Fiwo735 commented Feb 14, 2024

Thanks for the suggestions! I've left 3 comments unresolved - please have a look at them.

Re: splitting test.py - I'm still slightly more on the opinion that 1 file is less scary than 2 (or preferably even more as just throwing everything non-crucial into a helper file is not the best practice) and makes it look like there is not that much Python involved in this coursework = less intimidating. But that's not a very strong opinion, so I'm gonna check what John thinks about that.

@Fiwo735 Fiwo735 merged commit eed378d into main Feb 21, 2024
@Jpnock Jpnock mentioned this pull request Feb 21, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants