Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep original tmp slurm submission file as a hidden symlink #1771

Merged
merged 5 commits into from
Sep 18, 2024

Conversation

xman1979
Copy link
Contributor

@xman1979 xman1979 commented Aug 30, 2024

Why making this change?

if we do "scontrol show job", we get the submission scripts pointed to the temporary submission file which got removed,
e.g:

(jepa) [xiaodongma@rsccpu4035 xiaodongma]$ scontrol show job 4499193
JobId=4499203 JobName=xiaodongma
  ...
   Command=/home/xiaodongma/jepa-internal/xiaodongma/submission_file_e9c4eef46a24436b81d5213875f19d6c.sh
...

this can bring confusion to slurm ecosystem and make it hard to integration with other tooling that relies on parsing/post-inspecting the sbatch script.

Fix

This diff create the temporary submission file as a symlink to the moved submission file

Test

after fix, we can see the submission file

scontrol show job 4499203
JobId=4499203 JobName=xiaodongma
... Command=/checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/.submission_file_bb581d4ec3954cd9a45aa7388ad6494e.sh
...
(jepa) xiaodongma@xiaodongma-login-0:/checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8$ ll /checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/.submission_file_bb581d4ec3954cd9a45aa7388ad6494e.sh
lrwxrwxrwx 1 xiaodongma fair_amaia_cw_video 101 Sep 17 17:48 /checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/.submission_file_bb581d4ec3954cd9a45aa7388ad6494e.sh -> /checkpoint/amaia/video/xiaodongma/vjepav3/arch/vjepav1/vit.l.16.m8/job_1358522/1358522_submission.sh

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 30, 2024
@jrapin jrapin changed the title keep origina tmp slurm submission file Keep original tmp slurm submission file Sep 16, 2024
@jrapin
Copy link
Contributor

jrapin commented Sep 17, 2024

I'd rather the submission file be hidden as you had initially proposed, to avoid messing up (too much) with the folder

@xman1979 xman1979 changed the title Keep original tmp slurm submission file Keep original tmp slurm submission file as a hidden symlink Sep 17, 2024
@xman1979 xman1979 merged commit 59db80d into main Sep 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants