Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job monitoring utilities #2

Closed
wlandau opened this issue Oct 20, 2023 · 2 comments
Closed

Job monitoring utilities #2

wlandau opened this issue Oct 20, 2023 · 2 comments
Assignees

Comments

@wlandau
Copy link
Owner

wlandau commented Oct 20, 2023

It would be great to have standalone R function utilities to manage batch jobs. These would run in the user's interactive session outside the targets pipeline / crew controller. I am thinking of covering the same functionality as qsub, qstat, and qdel in SGE (sbatch, squeue, and scancel in SLURM), plus log files. Proposal:

  1. crew_aws_batch_submit(): submit a job that runs some code (R or shell). This could help e.g. submit a targets pipeline as a Batch job which submits other Batch jobs.
  2. crew_aws_batch_status(): get the status of jobs in a given job queue / job definition.
  3. crew_aws_batch_terminate(): terminate one or more jobs with specific job names/IDs/ARNs.
  4. crew_aws_batch_logs(): log files for one or more jobs, or for an entire job definition. This would really help detect tricky worker-level errors such as running out of memory or hitting a price spike that terminates spot instances.
@wlandau wlandau self-assigned this Oct 20, 2023
@DyfanJones
Copy link

Hi @wlandau for crew_aws_batch_logs you could hack smdocker logging method. In short when it is building a docker using AWS CodeBuild it returns the AWS CloudWatch logs back the to the console for R users to monitor and check.

https://github.com/DyfanJones/sm-docker/blob/main/R/logs.R

I am happy to contribute on this if you think it is possible solution for your problem :)

@wlandau
Copy link
Owner Author

wlandau commented Dec 8, 2023

Thanks for the input, @DyfanJones! I think I implemented what I need in https://github.com/wlandau/crew.aws.batch#job-management, but it would be amazing to have help with paws-r/paws#721 so I can request paginated downloads for log files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants