Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Myjobs app is not starting the job in the job's directory #3800

Closed
cipharius opened this issue Sep 17, 2024 · 2 comments · Fixed by #3913
Closed

Myjobs app is not starting the job in the job's directory #3800

cipharius opened this issue Sep 17, 2024 · 2 comments · Fixed by #3913

Comments

@cipharius
Copy link

I've been troubleshooting this for a while, thinking something was wrong with my configuration, but it seems that when myjobs ResourceMgrAdapter queues the job, it doesn't pass the working directory to the job system adapter.

Up until this point, the information about script's directory is retained:
https://github.com/OSC/ondemand/blob/master/apps/myjobs/app/models/resource_mgr_adapter.rb#L37-L46

Once the submit is invoked on the job adapter, the information about working directory is lost - the script's working directory isn't correct and the script's directory doesn't appear in the environment.

If I add workdir: Dir.pwd to the Script.new argument list, the jobs are ran in the script's directory instead of user's home directory.

I was not sure if this is the correct place to fix this issue so instead of PR I'm opening an issue instead.

@osc-bot osc-bot added this to the Backlog milestone Sep 17, 2024
@johrstrom
Copy link
Contributor

Luckily, I've already been through this on discourse. I'm assuming you use a submit_host or some wrapper to submit jobs somewhere other than the OOD VM?

This is happening because we do chdir into the right directory while submitting the job, but since you're SSHing somewhere else to issue the job submission command, the CWD is HOME.

https://discourse.openondemand.org/t/simple-question-execute-python-code-on-a-lsf-submit-host/3560/23

I was not sure if this is the correct place to fix this issue so instead of PR I'm opening an issue instead.

Either is fine by me! Even in debugging that discourse topic, it didn't occur to me to just specify the workdir instead of relying on Dir.chdir.

PRs welcome!

@cipharius
Copy link
Author

Thanks for the quick reply!

Yeah, that is correct, I am running the jobs on remote machine that is sharing the same home directory subtree. Open OnDemand is supposed to be a frontend to a Slurm HPC cluster previously accessed via CLI only.

In that case I'll open a PR for the explicit workdir parameter. Wouldn't affect those who ran the batch jobs from OOD machine itself, though would change behaviour for those who already got used to the jobs running under home directory on remote node.

I was mostly suprised that this is how it had worked all this time and I couldn't find posts complaining about this specific issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants