Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/benchmarks on windows #184

Merged
merged 17 commits into from
Feb 15, 2024
Merged

Feature/benchmarks on windows #184

merged 17 commits into from
Feb 15, 2024

Conversation

SamRWest
Copy link
Collaborator

@SamRWest SamRWest commented Feb 14, 2024

This PR contains changes necessary to run the benchmarks locally on a Windows machine.

Main changes:

  • Introduced shell=True param to subprocess calls. Without this, the virtualenv was not accessible to the subprocesses, which would fail with, e.g. ModuleNotFound: pandas exceptions.

  • Python subprocesses in windows are heavyweight and so use a lot more RAM than linux, so I've added a variable in utils that limits the number max_workers param in ProcessPoolExecutor constructors to a small number (used to be the number of cpus). Without this limitation, the pool-inside-a-pool structure of the benchmarks was spawning 32*32 python.exe processes and using all my 64Gb of RAM in a few seconds.

  • Added a --debug switch to run_benchmarks to run each benchmakr as a function call, allowing use of debuggers/breakpoints

  • CI now runs CSV-only regression tests if GAMS license isn't in secrets

.github/workflows/ci.yml Show resolved Hide resolved
README.md Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
utils/dd_to_csv.py Show resolved Hide resolved
summary = run(parse_args(args))

# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above).
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a --debug switch (False by default, so no change unless set) to the run_benchmarks script that calls the functions directly instead of in subprocesses.
This is because I wanted to hit breakpoints inside the benchmarks, which is impossible with a subprocess call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! So far I've just been launching the debugger and manually specifying the args to xl2times, which is a pain for the benchmarks that have a list of excel files as arg. This should make it much more seamless!

What happens when we use --debug but not --run? Does it still try to run all the benchmarks in parallel? Would that mess up the debugger? If so, maybe we should enforce --debug implies --run

Copy link
Collaborator Author

@SamRWest SamRWest Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I found commenting out lines in benchmarks.yml was a nicer workflow than running them individually.

What happens when we use --debug but not --run? Does it still try to run all the benchmarks in parallel? Would that mess up the debugger? If so, maybe we should enforce --debug implies --run

As it is now, --debug works with and without --run as it gets passed into both run_all_benchmarks() (without --run) and run_a_benchmark() (with --run).
Am I missing something about how these switches are intended to be used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the way run_benchmarks.py works right now is to run all benchmarks in parallel by default, unless --run is specified, in which case it runs a single benchmark. Without --run, if --debug is specified and the developer enables a breakpoint in the code in say transforms.py, that breakpoint will potentially be hit on every benchmark that is being run in parallel. So on which one does the debugger stop? Will this break the debugger?

I've not tried putting a breakpoint in code that is being run in parallel before. :)

If my understanding is correct, perhaps it is simpler to enforce that if you use --debug you must also use --run?

utils/run_benchmarks.py Show resolved Hide resolved

def main(arg_list: None | list[str] = None):
args = parse_args(arg_list)
run(args)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this is so I can call a function directly when --debug is set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. Why not inline run into main and remove run? The only other place it's being used also calls run(parse_args(...)), which is essentially main(...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question.
I did exactly this initially, but kept getting exitcode=1 (even after renumbering all the explicit exit codes).
Eventually worked out that if main() returns a 'truthy' value (i.e. the string returned by run() in --debug mode now), that gets interpreted as a 1 exit code and the benchmark is seen to have failed, even when successful.
Hiding this exit code inside run() solved this.

But yeah, I agree with your comment below that using subprocess and exit codes is pretty brittle (especially cross-platform) and it'd be better to run everything as function calls and catch exceptions with detailed error messages on failures.

xl2times/utils.py Show resolved Hide resolved
utils/run_benchmarks.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sam, these are some great improvements to our CI/dev process! I've left a few minor comments/suggestions.

But now, thinking about it again, do we really need subprocess in run_benchmarks? With your changes, we can always call the main functions of dd_to_csv and xl2times directly, and perhaps that eliminates some overhead, even in linux/macOS. What do you think?

.github/workflows/ci.yml Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.python-version Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved


if __name__ == "__main__":
main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to pass in an arg_list here? (I haven't seen this idiom before)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam, do you need to make this change here too?

Suggested change
main()
main(sys.argv[1:])


def main(arg_list: None | list[str] = None):
args = parse_args(arg_list)
run(args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. Why not inline run into main and remove run? The only other place it's being used also calls run(parse_args(...)), which is essentially main(...).

summary = run(parse_args(args))

# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above).
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! So far I've just been launching the debugger and manually specifying the args to xl2times, which is a pain for the benchmarks that have a list of excel files as arg. This should make it much more seamless!

What happens when we use --debug but not --run? Does it still try to run all the benchmarks in parallel? Would that mess up the debugger? If so, maybe we should enforce --debug implies --run

utils/run_benchmarks.py Outdated Show resolved Hide resolved
@olejandro
Copy link
Member

Great, thanks a lot @SamRWest! So soon I won't have any excuse not to run benchmarks this way on windows. 😃 I've been mostly using batch scripts to run the tool on a benchmark I was working on and than letting CI point out what I was breaking. 😆

@SamRWest
Copy link
Collaborator Author

Great, thanks a lot @SamRWest! So soon I won't have any excuse not to run benchmarks this way on windows. 😃 I've been mostly using batch scripts to run the tool on a benchmark I was working on and than letting CI point out what I was breaking. 😆

No worries!
I'm a firm believer that speeding up the feedback loop is always better for productivity.

@SamRWest
Copy link
Collaborator Author

Thanks Sam, these are some great improvements to our CI/dev process! I've left a few minor comments/suggestions.

But now, thinking about it again, do we really need subprocess in run_benchmarks? With your changes, we can always call the main functions of dd_to_csv and xl2times directly, and perhaps that eliminates some overhead, even in linux/macOS. What do you think?

Yeah agreed. Subprocesses suck. Whenever I've used them I always spend way too long troubleshooting silly things like cross-OS inconsistencies and character encoding bugs.

Changing the subprocess calls to function calls, and maybe even sticking them into unit tests (so you can run with, say pytest -m benchmarks) would be a lot simpler and make it easier to debug using exception messages rather than exit codes.
Probably one for a future PR though as it'd be a pretty big change to the pipeline.

…ows' into feature/benchmarks_on_windows

# Conflicts:
#	.gitignore
#	README.md
@SamRWest SamRWest enabled auto-merge (squash) February 15, 2024 02:25
@SamRWest
Copy link
Collaborator Author

I think I've addressed all the comments here now @olejandro. If you're happy, it just needs an approval now.

@olejandro
Copy link
Member

Issues dealt with in this PR are a bit outside of my area of expertise. :) @siddharth-krishna please feel free to merge if you are happy. I am happy. :)

Copy link
Collaborator

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of quick small things, feel free to merge once they have been resolved! Thanks again. 🚀

@@ -3,7 +3,7 @@ requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
packages = ["xl2times"]
packages = ["xl2times", "utils"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also install a utils package to the python (virtual)env if someone does pip install . in the root directory? Wondering if that might result in name clashes.

I was thinking of utils more as a collection of scripts for benchmarking/developing, and not as a package to be distributed. Do we need to include it in pyproject.toml? Happy to go with the best practice here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah apologies, this was a fix for a ModuleNotFound from a from utils.dd_to_csv import in run_benchmarks I added, but the right solution was to import dd_to_csv directly.
I'll revert this change in #186

But yes, this would have included utils in built wheels etc. I didn't realise this was meant to not be included.



if __name__ == "__main__":
main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam, do you need to make this change here too?

Suggested change
main()
main(sys.argv[1:])

summary = run(parse_args(args))

# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above).
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the way run_benchmarks.py works right now is to run all benchmarks in parallel by default, unless --run is specified, in which case it runs a single benchmark. Without --run, if --debug is specified and the developer enables a breakpoint in the code in say transforms.py, that breakpoint will potentially be hit on every benchmark that is being run in parallel. So on which one does the debugger stop? Will this break the debugger?

I've not tried putting a breakpoint in code that is being run in parallel before. :)

If my understanding is correct, perhaps it is simpler to enforce that if you use --debug you must also use --run?

@SamRWest SamRWest merged commit 9c36c37 into main Feb 15, 2024
1 check passed
@SamRWest SamRWest deleted the feature/benchmarks_on_windows branch February 15, 2024 04:39
@SamRWest
Copy link
Collaborator Author

Ah crap, it auto-merged once you approved @siddharth-krishna, sorry.
I'll address your comments before merging #186 instead.

@siddharth-krishna
Copy link
Collaborator

No worries! That sounds good, thanks. :)

@SamRWest
Copy link
Collaborator Author

SamRWest commented Feb 15, 2024

Addressing your last two comments (I can't reply directly now it's been merged):

Sam, do you need to make this change here too?

It's not strictly necessary as argparse will do this for us. But I'll do it anyway so it's more clear what's going on.

Without --run, if --debug is specified and the developer enables a breakpoint in the code in say transforms.py, that breakpoint will potentially be hit on every benchmark that is being run in parallel. So on which one does the debugger stop? Will this break the debugger?

The python debuggers I've used (pycharm, vscode) generally work fine on multiprocess breakpoints. They'll usually just break at the same point in every process (though sometimes just the first hit), which is sometimes useful (if you don't know which one has the bug you're hunting, for example).

Granted this can be confusing if you have all 13 benchmarks enabled and you get stopped at the breakpoint 13 times though..

A potential alternative could be to disable the process-pool benchmark parallelisation in --debug mode. This might be easier to understand, and better suited to the comment-out-stuff-in-benchmarks.yml workflow?

SamRWest added a commit that referenced this pull request Feb 15, 2024
fixed bug in git code when no remote is called 'origin'
@siddharth-krishna
Copy link
Collaborator

It's not strictly necessary as argparse will do this for us. But I'll do it anyway so it's more clear what's going on.

Cool, didn't know how argparse worked. Sounds good

A potential alternative could be to disable the process-pool benchmark parallelisation in --debug mode. This might be easier to understand, and better suited to the comment-out-stuff-in-benchmarks.yml workflow?

Great idea! I had anyway wanted to include a --no-parallel flag, because right now the runtimes reported aren't super meaningful given that there's so much parallelism and non-determinism involved, so it would be useful to have it run all the benchmarks in series to benchmark perf changes. --debug could be used for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants