-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/benchmarks on windows #184
Conversation
…d of subprocesses (which use way too much RAM) Added pyarrow to suppress pandas future warning
…nts in them work added traceable exit codes
…we're just missing a GAMS license.
skip gams install if no license
summary = run(parse_args(args)) | ||
|
||
# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above). | ||
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a --debug
switch (False by default, so no change unless set) to the run_benchmarks script that calls the functions directly instead of in subprocesses.
This is because I wanted to hit breakpoints inside the benchmarks, which is impossible with a subprocess call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! So far I've just been launching the debugger and manually specifying the args to xl2times
, which is a pain for the benchmarks that have a list of excel files as arg. This should make it much more seamless!
What happens when we use --debug
but not --run
? Does it still try to run all the benchmarks in parallel? Would that mess up the debugger? If so, maybe we should enforce --debug
implies --run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I found commenting out lines in benchmarks.yml was a nicer workflow than running them individually.
What happens when we use --debug but not --run? Does it still try to run all the benchmarks in parallel? Would that mess up the debugger? If so, maybe we should enforce --debug implies --run
As it is now, --debug
works with and without --run
as it gets passed into both run_all_benchmarks()
(without --run) and run_a_benchmark()
(with --run).
Am I missing something about how these switches are intended to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, the way run_benchmarks.py
works right now is to run all benchmarks in parallel by default, unless --run
is specified, in which case it runs a single benchmark. Without --run
, if --debug
is specified and the developer enables a breakpoint in the code in say transforms.py
, that breakpoint will potentially be hit on every benchmark that is being run in parallel. So on which one does the debugger stop? Will this break the debugger?
I've not tried putting a breakpoint in code that is being run in parallel before. :)
If my understanding is correct, perhaps it is simpler to enforce that if you use --debug
you must also use --run
?
|
||
def main(arg_list: None | list[str] = None): | ||
args = parse_args(arg_list) | ||
run(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this is so I can call a function directly when --debug
is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. Why not inline run
into main
and remove run
? The only other place it's being used also calls run(parse_args(...))
, which is essentially main(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question.
I did exactly this initially, but kept getting exitcode=1 (even after renumbering all the explicit exit codes).
Eventually worked out that if main() returns a 'truthy' value (i.e. the string returned by run()
in --debug
mode now), that gets interpreted as a 1
exit code and the benchmark is seen to have failed, even when successful.
Hiding this exit code inside run()
solved this.
But yeah, I agree with your comment below that using subprocess and exit codes is pretty brittle (especially cross-platform) and it'd be better to run everything as function calls and catch exceptions with detailed error messages on failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sam, these are some great improvements to our CI/dev process! I've left a few minor comments/suggestions.
But now, thinking about it again, do we really need subprocess in run_benchmarks
? With your changes, we can always call the main
functions of dd_to_csv
and xl2times
directly, and perhaps that eliminates some overhead, even in linux/macOS. What do you think?
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to pass in an arg_list
here? (I haven't seen this idiom before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sam, do you need to make this change here too?
main() | |
main(sys.argv[1:]) |
|
||
def main(arg_list: None | list[str] = None): | ||
args = parse_args(arg_list) | ||
run(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. Why not inline run
into main
and remove run
? The only other place it's being used also calls run(parse_args(...))
, which is essentially main(...)
.
summary = run(parse_args(args)) | ||
|
||
# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above). | ||
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! So far I've just been launching the debugger and manually specifying the args to xl2times
, which is a pain for the benchmarks that have a list of excel files as arg. This should make it much more seamless!
What happens when we use --debug
but not --run
? Does it still try to run all the benchmarks in parallel? Would that mess up the debugger? If so, maybe we should enforce --debug
implies --run
Great, thanks a lot @SamRWest! So soon I won't have any excuse not to run benchmarks this way on windows. 😃 I've been mostly using batch scripts to run the tool on a benchmark I was working on and than letting CI point out what I was breaking. 😆 |
No worries! |
Yeah agreed. Subprocesses suck. Whenever I've used them I always spend way too long troubleshooting silly things like cross-OS inconsistencies and character encoding bugs. Changing the subprocess calls to function calls, and maybe even sticking them into unit tests (so you can run with, say |
…ows' into feature/benchmarks_on_windows # Conflicts: # .gitignore # README.md
I think I've addressed all the comments here now @olejandro. If you're happy, it just needs an approval now. |
Issues dealt with in this PR are a bit outside of my area of expertise. :) @siddharth-krishna please feel free to merge if you are happy. I am happy. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of quick small things, feel free to merge once they have been resolved! Thanks again. 🚀
@@ -3,7 +3,7 @@ requires = ["setuptools>=61.0.0", "wheel"] | |||
build-backend = "setuptools.build_meta" | |||
|
|||
[tool.setuptools] | |||
packages = ["xl2times"] | |||
packages = ["xl2times", "utils"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this also install a utils
package to the python (virtual)env if someone does pip install .
in the root directory? Wondering if that might result in name clashes.
I was thinking of utils
more as a collection of scripts for benchmarking/developing, and not as a package to be distributed. Do we need to include it in pyproject.toml
? Happy to go with the best practice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah apologies, this was a fix for a ModuleNotFound from a from utils.dd_to_csv
import in run_benchmarks I added, but the right solution was to import dd_to_csv
directly.
I'll revert this change in #186
But yes, this would have included utils
in built wheels etc. I didn't realise this was meant to not be included.
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sam, do you need to make this change here too?
main() | |
main(sys.argv[1:]) |
summary = run(parse_args(args)) | ||
|
||
# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above). | ||
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, the way run_benchmarks.py
works right now is to run all benchmarks in parallel by default, unless --run
is specified, in which case it runs a single benchmark. Without --run
, if --debug
is specified and the developer enables a breakpoint in the code in say transforms.py
, that breakpoint will potentially be hit on every benchmark that is being run in parallel. So on which one does the debugger stop? Will this break the debugger?
I've not tried putting a breakpoint in code that is being run in parallel before. :)
If my understanding is correct, perhaps it is simpler to enforce that if you use --debug
you must also use --run
?
Ah crap, it auto-merged once you approved @siddharth-krishna, sorry. |
No worries! That sounds good, thanks. :) |
Addressing your last two comments (I can't reply directly now it's been merged):
It's not strictly necessary as argparse will do this for us. But I'll do it anyway so it's more clear what's going on.
The python debuggers I've used (pycharm, vscode) generally work fine on multiprocess breakpoints. They'll usually just break at the same point in every process (though sometimes just the first hit), which is sometimes useful (if you don't know which one has the bug you're hunting, for example). Granted this can be confusing if you have all 13 benchmarks enabled and you get stopped at the breakpoint 13 times though.. A potential alternative could be to disable the process-pool benchmark parallelisation in --debug mode. This might be easier to understand, and better suited to the comment-out-stuff-in-benchmarks.yml workflow? |
fixed bug in git code when no remote is called 'origin'
Cool, didn't know how argparse worked. Sounds good
Great idea! I had anyway wanted to include a |
This PR contains changes necessary to run the benchmarks locally on a Windows machine.
Main changes:
Introduced
shell=True
param to subprocess calls. Without this, the virtualenv was not accessible to the subprocesses, which would fail with, e.g.ModuleNotFound: pandas
exceptions.Python subprocesses in windows are heavyweight and so use a lot more RAM than linux, so I've added a variable in
utils
that limits the numbermax_workers
param in ProcessPoolExecutor constructors to a small number (used to be the number of cpus). Without this limitation, the pool-inside-a-pool structure of the benchmarks was spawning 32*32 python.exe processes and using all my 64Gb of RAM in a few seconds.Added a
--debug
switch torun_benchmarks
to run each benchmakr as a function call, allowing use of debuggers/breakpointsCI now runs CSV-only regression tests if GAMS license isn't in secrets