You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@jadhao@kadupitiya
Periodically, jobs I am running are terminating with exit code 143 on BigRed3.
There is no error in the log, energy is conserved and no obvious code issues. Restarting the job manually will generally run past the point where the error was thrown without issue.
Here are two examples, where the job was submitted for 20 hours.
After apparent timeout:
Slurm Job_id=254798 Name=MVM_2.15_T_298 Failed, Run time 20:00:06, FAILED, ExitCode 143
Prematurely:
Slurm Job_id=257095 Name=MVM_1.95_T_298 Failed, Run time 01:59:54, FAILED, ExitCode 143
Both of these ended at the exact same time, along with two jobs that timed out and one more that had an apparent timeout.
Is there a bug in the code or job script that could be causing this? I have measured the memory consumption locally -- it seems it should be well below the memory limit on BR3.
The text was updated successfully, but these errors were encountered:
@jadhao @kadupitiya
Periodically, jobs I am running are terminating with exit code 143 on BigRed3.
There is no error in the log, energy is conserved and no obvious code issues. Restarting the job manually will generally run past the point where the error was thrown without issue.
Here are two examples, where the job was submitted for 20 hours.
After apparent timeout:
Slurm Job_id=254798 Name=MVM_2.15_T_298 Failed, Run time 20:00:06, FAILED, ExitCode 143
Prematurely:
Slurm Job_id=257095 Name=MVM_1.95_T_298 Failed, Run time 01:59:54, FAILED, ExitCode 143
Both of these ended at the exact same time, along with two jobs that timed out and one more that had an apparent timeout.
Is there a bug in the code or job script that could be causing this? I have measured the memory consumption locally -- it seems it should be well below the memory limit on BR3.
The text was updated successfully, but these errors were encountered: