Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effectively use batchtools for Targets-based workflow in SLURM #292

Open
stemangiola opened this issue Jan 25, 2023 · 4 comments
Open

Effectively use batchtools for Targets-based workflow in SLURM #292

stemangiola opened this issue Jan 25, 2023 · 4 comments

Comments

@stemangiola
Copy link

Thanks for the great package.

We are converting a makeflow workflow to R using targets + batchtools, for a SLURM system.

However, we find practically unusable because the job that fail do not communicate to batchtools that thinks are still executing. They might fail for memory overflow or timeout.

Please see ropensci/targets#932

Are you aware of these limitations and do you know a way to solve this?

@HenrikBengtsson
Copy link

HenrikBengtsson commented Jan 27, 2023

Not the maintainer, but I think you'll increase the chances for fixing/improving things if you can come up with an example that illustrates the problem with only on batchtools code.

Also, showing the exact slurm template used can increase the chances to reproduce this, and maybe even reproduce it on other schedulers.

The more details, the better

@stuvet
Copy link
Contributor

stuvet commented Jan 30, 2023

For future visitors ropensci/targets#570 (comment) may help.

It looks like the CRAN version of batchtools does not yet include the fixes needed to achieve stability on Slurm when called via future.batchtools (at least). I suspect it could still be achieved via a custom clusterFunctions, rather than clusterFunctionsSlurm, though there is one more fixed issue that may still cause problems in the current CRAN version.

After a recent chat with @stemangiola & as per @HenrikBengtsson suggestion & I'm working on a working targets reprex (handling OOM errors & timeouts properly) & I'll repost it here so that others can check their configurations & package versions.

@HenrikBengtsson
Copy link

HenrikBengtsson commented Jan 30, 2023

A reproducible example based on targets is a first step, but I'd think you'll significantly increase the chances for a faster response/fix if you make it use vanilla batchtools code. If not, you're basically asking whoever is going to look into this to do that work, i.e. to peel of the targets and the future.batchtools code to find what needs to be fixed in batchtools.

@stuvet
Copy link
Contributor

stuvet commented Jan 30, 2023

I only mention it because I strongly suspect the work has already been done (as mentioned in the previous link) & implemented in the GitHub version of batchtools.

Before people submit new issues to targets, future.batchtools or batchtools it feels important for them to be able to validate their own hardware & that the toolchains have already been updated to include existing bugfixes as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants