-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check status for simulation code in jf job list
#227
Comments
jf job list
Hi @FabiPi3, in general the idea of jobflow is that if a Job finishes correctly it is marked as If your purpose is to identify Jobs that did not complete correctly here are a few options that you can consider:
Although all these options will not allow you to tell at a glance what was exactly the problem with the job just by running |
Thanks for your answer @gpetretto I experienced that if the job fails with an raised error there will be no entry in the output database. I guess there is no way to change this? Wouldn't make much sense to have an raised error which also returns some data, or? In general I want to keep my internal error handling to have consistent entries in the output database. Sometimes a failed calculation gives you also some information you want. I tested the stored_data attribute, and it looks quite promising. I quickly implemented an option to show these also with |
From the stack trace this looks more like an issue in monty about how it deals with Enum. If I remember correctly you already made some changes to that part. Can you try just dumping a simple dictionary with an enum like in your case and check if you get the same error? Maybe a call to |
Yes I did. After some searching, I think the issue comes from jobflow. I am subclassing their ValueEnum, and see this: They apparently have a |
This is a basically a feature request, which I guess is not so easy to implement. But maybe someone knows an easier way to achieve what I want.
Here is the story: I run mainly a simulation code with jobflow-remote, so something like VASP or Abinit. Usually a single, lets say VASP, calculation corresponds to one jobflow job. Now looking at the result with
jf job info
orjf job list
, I see that the job(s) is COMPLETED. This is good, of course. But it does only mean that there was no python error. Basically I catch any error in the simulation code via python to ensure appropriate error handling and a proper entry in the output database. What I would like to see also is directly a hint whether the actual calculation from the simulation code was successful or not.So one question would be on how to pass this information to jobflow(-remote) since it is a very specific information. I could imagine creating directly during runtime a special file
success.out
which contains a status string or whatever. Another option might be to somehow return aResponse
object with a new fieldsimulation_success
. But this might be already to specific. Then jobflow-remote would need to parse this and include in the JobDoc which is quite some overhead I guess.Another option would be to generate this info only locally while running the
jf job list
command. But this again requires a very specific format to be kept. In principle I could provide a check function for my simulation code which takes the run_dir and the stdout/stderr and determines the run success. But as it needs to read different files depending on rather complex logic, one would need to download all those files which is also not a very good option I guess.Any opinion?
The text was updated successfully, but these errors were encountered: