Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete job files when rerunning; customizable execution command #201

Merged
merged 7 commits into from
Nov 8, 2024

Conversation

gpetretto
Copy link
Contributor

Closing #197 and #196.

Delete files on rerun

As discussed in #197, this will remove the files from the worker when a Job is rerun.

  1. The removal is applied by default, but can be deactivated
  2. If an error happens while removing the files, the Job will not be rerun
  3. If failing to remove files due to the fact that the jfremote_in.json is missing a warning will be printed, but Job will be rerun

Note that this functionality might have a direct impact for @JaGeo: now if a worker has a MFA when rerunning a job it will ask to insert the OTP.

Also I am not 100% convinced about point 2, if it instead should print a warning and still rerun the Job.

Also pinging @QuantumChemist @utf.

Customizable execution command

As discussed in #196 added a new option to customize the jf -fe execution run: execution_cmd

@yw-fang would this be fine?

@codecov-commenter
Copy link

codecov-commenter commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 73.91304% with 36 lines in your changes missing coverage. Please review.

Project coverage is 73.18%. Comparing base (8245777) to head (51365f6).

Files with missing lines Patch % Lines
src/jobflow_remote/utils/remote.py 78.33% 9 Missing and 4 partials ⚠️
src/jobflow_remote/jobs/jobcontroller.py 68.57% 8 Missing and 3 partials ⚠️
src/jobflow_remote/cli/flow.py 50.00% 2 Missing and 2 partials ⚠️
src/jobflow_remote/cli/utils.py 50.00% 3 Missing and 1 partial ⚠️
src/jobflow_remote/config/base.py 69.23% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #201      +/-   ##
===========================================
+ Coverage    73.08%   73.18%   +0.10%     
===========================================
  Files           47       48       +1     
  Lines         6249     6347      +98     
  Branches       991     1012      +21     
===========================================
+ Hits          4567     4645      +78     
- Misses        1329     1340      +11     
- Partials       353      362       +9     
Files with missing lines Coverage Δ
src/jobflow_remote/cli/job.py 82.22% <ø> (ø)
src/jobflow_remote/jobs/data.py 88.93% <100.00%> (+0.04%) ⬆️
src/jobflow_remote/jobs/runner.py 78.31% <100.00%> (+0.51%) ⬆️
src/jobflow_remote/cli/flow.py 70.00% <50.00%> (-2.12%) ⬇️
src/jobflow_remote/cli/utils.py 63.49% <50.00%> (-0.27%) ⬇️
src/jobflow_remote/config/base.py 83.33% <69.23%> (-0.83%) ⬇️
src/jobflow_remote/jobs/jobcontroller.py 78.68% <68.57%> (+0.36%) ⬆️
src/jobflow_remote/utils/remote.py 78.33% <78.33%> (ø)

This was linked to issues Oct 25, 2024
@yw-fang
Copy link

yw-fang commented Oct 25, 2024

I like this added feature! Thanks very much @gpetretto

@JaGeo
Copy link
Collaborator

JaGeo commented Oct 25, 2024

@gpetretto oh, i see. This might be a lot of work.

@gpetretto
Copy link
Contributor Author

@gpetretto oh, i see. This might be a lot of work.

There is a --no-delete option to skip the deletion. Not sure if that would be convenient enough for you. Otherwise we may consider adding a CLI setting to determine the default behaviour. See for example:

cli_suggestions: bool = Field(

@davidwaroquiers
Copy link
Member

To comment on the problem with the OTP, one thing we discussed with @gpetretto is to also implement the deletion of the files when a job starts. This could be activated through, e.g. a "rerun_delete_strategy" option or something like that. Default would be e.g. "UPON_RERUN" with other two options being "NONE" (don't delete files) and "WHEN_STARTING". In your case @JaGeo, you'd be using this second option. I don't remember what were the downsides of this.

@JaGeo
Copy link
Collaborator

JaGeo commented Oct 28, 2024

@davidwaroquiers i see. Thanks! Might be good to have a detailed documentation on this then.

Copy link
Member

@davidwaroquiers davidwaroquiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Just a few things here and there. + comment regarding OTP.

src/jobflow_remote/config/base.py Show resolved Hide resolved
src/jobflow_remote/jobs/runner.py Show resolved Hide resolved
src/jobflow_remote/config/base.py Show resolved Hide resolved
src/jobflow_remote/utils/remote.py Show resolved Hide resolved
@gpetretto
Copy link
Contributor Author

After some discussion with @davidwaroquiers, we have thought that having multiple options might be confusing and it is not entirely clear if there is a benefit. So, instead, we could entirely remove the deletion process at the time the jf job rerun command is executed and the folder will instead be cleaned by the runner during the UPLOAD phase, before uploading the data. In that case we would leave this as the only option, so there is no ambiguity on how things are performed.
This will avoid entirely the MFA issue. Oher advantages of this option are that it will speed up the rerun command and it will remove a potential point of failure in a tricky process as the rerun of multiple jobs. The --no-delete option will still be available. If the deletion fails it will go in a REMOTE_ERROR state and the user might need to manually clean up the folder.
Of course the whole procedure will be described in the documentation.

@JaGeo, @utf do you see any reason to keep the file deletion in the rerun command as an alternative option?

@JaGeo
Copy link
Collaborator

JaGeo commented Oct 31, 2024

@gpetretto yes, this sounds like the best solution!

Copy link
Member

@davidwaroquiers davidwaroquiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! One or two suggestions for comments (up to you). Also one comment about python 3.12 (maybe not needed for this specific PR ?)

src/jobflow_remote/jobs/jobcontroller.py Show resolved Hide resolved
src/jobflow_remote/jobs/data.py Outdated Show resolved Hide resolved
src/jobflow_remote/jobs/jobcontroller.py Show resolved Hide resolved
@gpetretto gpetretto merged commit fb0b196 into develop Nov 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delete VASP files when rerunnning jobs? call jf from container
5 participants