You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An error message was stored many times into my JobDB: 'pbs_iff: cannot read reply from pbs_server\nNo Permission'. This caused db lookups to fail e.g.
db.select_regex_id("jobid", 'pbs_iff: cannot read reply from pbs_server\nNo Permission')
because the select_job function in jobdb.py can't handle duplicates. I fixed this by making a new select_job and passing the first returned job to delete_job:
def select_duplicate_jobs(self, jobid):
if not isinstance(jobid, string_types):
print("Error in prisms_jobs.JobDB.select_job(). type(id):", type(jobid), "expected str.")
sys.exit()
self.curs.execute("SELECT * FROM jobs WHERE jobid=?", (jobid,))
import pdb; pdb.set_trace()
dupes = self.curs.fetchall() #pylint: disable=invalid-name
if len(dupes) == 0:
raise JobDBError("Error in prisms_jobs.JobDB.select_job(). jobid: '"
+ jobid + "' not found in jobs database.")
return [CompatibilityRow(r) for r in dupes]
I am also wondering if this issue could come up if the job ids on the cluster are reset/lost because the queue crashes. I noticed in the casm-calc output that it is often finding an existing JobID, but it seems to be running fine.
Update: actually, they have all failed. Maybe this is a separate issue, but casm-calc reported that a JobID was found, printed out the list of nodes, and then hung there. Deleting the job from the db and resubmitting was successful.
An error message was stored many times into my JobDB: 'pbs_iff: cannot read reply from pbs_server\nNo Permission'. This caused db lookups to fail e.g.
db.select_regex_id("jobid", 'pbs_iff: cannot read reply from pbs_server\nNo Permission')
because the select_job function in jobdb.py can't handle duplicates. I fixed this by making a new
select_job
and passing the first returned job to delete_job:I am also wondering if this issue could come up if the job ids on the cluster are reset/lost because the queue crashes. I noticed in the casm-calc output that it is often finding an existing JobID, but it seems to be running fine.
Update: actually, they have all failed. Maybe this is a separate issue, but casm-calc reported that a JobID was found, printed out the list of nodes, and then hung there. Deleting the job from the db and resubmitting was successful.
The text was updated successfully, but these errors were encountered: