You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the very beginning, thank you very much for the other issues I opened for dispy.
Now I am faced with the problem that I cannot always get the return values as expected for each job. In other words, under most conditions, the job is executed as expected and return expected values, but in some scenarios, it just returns a NoneType value.
For the jobs under such scenarios, there is a part of function waiting for some files showing up via while True loop. If the file exists in the beginning, the job will be executed normally, however, if the file does not exist in the beginning, even though it shows up within the given time constraint, it seems the job stops directly (at the end of the function, some features, such as time points of steps, will be inserted into a database, and there is no record for the scenarios not returning expected values) and only returns a NoneType value.
When I am faced with the above-described problems, the jobs scheduled after these non-expected-return-values jobs can be executed normally, but there is another situation that the scheduled job is hanging forever and the following jobs cannot be executed any longer. And the whole system is stuck, cannot be closed via cluster.close(). And I need to restart the whole system to get it back in work. Besides, this situation can happen at any time without a specific pattern (or I am not quite familiar with this issue).
I have tried to set the loglevel to debug to get more information about the first problem. But everything seems ok. each job has three different lines, one for long-job-id running, another for short-job-id execution and the last for reply received. So I have no idea what is happening. I would really appreciate it if you could provide some possible reasons for both problems.
Thank you very much for your help and guidance!
The text was updated successfully, but these errors were encountered:
Check the job status; if the job finished without errors, it would be dispy.DispyJob.Finished, otherwise, job should be considered failed (e.g., cancelled / job's execution raised an exception etc.), in which case job's stderr / exception attributes may have some useful information.
For the stuck jobs, I have checked the job status, ip address, stderr and exception. The job status is 5, and all other three values are none.
With the manual log file saved in the nodes, I found it finished all their computation tasks, but from the debugging info from on the node side, it does not send the result for the job. Could you please provide some more guidance on this problem?
In the very beginning, thank you very much for the other issues I opened for dispy.
Now I am faced with the problem that I cannot always get the return values as expected for each job. In other words, under most conditions, the job is executed as expected and return expected values, but in some scenarios, it just returns a NoneType value.
For the jobs under such scenarios, there is a part of function waiting for some files showing up via while True loop. If the file exists in the beginning, the job will be executed normally, however, if the file does not exist in the beginning, even though it shows up within the given time constraint, it seems the job stops directly (at the end of the function, some features, such as time points of steps, will be inserted into a database, and there is no record for the scenarios not returning expected values) and only returns a NoneType value.
When I am faced with the above-described problems, the jobs scheduled after these non-expected-return-values jobs can be executed normally, but there is another situation that the scheduled job is hanging forever and the following jobs cannot be executed any longer. And the whole system is stuck, cannot be closed via cluster.close(). And I need to restart the whole system to get it back in work. Besides, this situation can happen at any time without a specific pattern (or I am not quite familiar with this issue).
I have tried to set the loglevel to debug to get more information about the first problem. But everything seems ok. each job has three different lines, one for long-job-id running, another for short-job-id execution and the last for reply received. So I have no idea what is happening. I would really appreciate it if you could provide some possible reasons for both problems.
Thank you very much for your help and guidance!
The text was updated successfully, but these errors were encountered: