Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance <DbAuthInfo> is not bound to a Session error #6024

Closed
2 tasks done
unkcpz opened this issue May 17, 2023 · 5 comments
Closed
2 tasks done

Instance <DbAuthInfo> is not bound to a Session error #6024

unkcpz opened this issue May 17, 2023 · 5 comments
Labels
type/bug type/duplicate close issue when applying duplicate label

Comments

@unkcpz
Copy link
Member

unkcpz commented May 17, 2023

Describe the bug

When there are > 500 calcjobs in the process list, some processes quickly run into exceptions below, verdi process play -a not help.

+-> ERROR at 2023-05-17 00:08:14.484628+02:00
 | Traceback (most recent call last):
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/utils.py", line 187, in exponential_backoff_retry
 |     result = await coro()
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 192, in do_update
 |     with job_manager.request_job_info_update(authinfo, job_id) as update_request:
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/contextlib.py", line 119, in __enter__
 |     return next(self.gen)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/manager.py", line 286, in request_job_info_update
 |     with self.get_jobs_list(authinfo).request_job_info_update(job_id) as request:
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/contextlib.py", line 119, in __enter__
 |     return next(self.gen)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/manager.py", line 167, in request_job_info_update
 |     self._ensure_updating()
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/manager.py", line 195, in _ensure_updating
 |     self._get_next_update_delay(),
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/manager.py", line 230, in _get_next_update_delay
 |     minimum_interval = self.get_minimum_update_interval()
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/manager.py", line 79, in get_minimum_update_interval
 |     return self._authinfo.computer.get_minimum_job_poll_interval()
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/orm/authinfos.py", line 87, in computer
 |     return entities.from_backend_entity(computers.Computer, self._backend_entity.computer)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/storage/psql_dos/orm/authinfos.py", line 74, in computer
 |     return self.backend.computers.ENTITY_CLASS.from_dbmodel(self.model.dbcomputer, self.backend)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/storage/psql_dos/orm/utils.py", line 84, in __getattr__
 |     if self.is_saved() and self._is_mutable_model_field(item) and not self._in_transaction():
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/aiida/storage/psql_dos/orm/utils.py", line 110, in is_saved
 |     return self._model.id is not None
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 482, in __get__
 |     return self.impl.get(state, dict_)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
 |     value = self._fire_loader_callables(state, key, passive)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 973, in _fire_loader_callables
 |     return state._load_expired(state, passive)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/sqlalchemy/orm/state.py", line 712, in _load_expired
 |     self.manager.expired_attribute_loader(self, toload, passive)
 |   File "/home/jyu/micromamba/envs/aiida-sssp-unstable/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 1369, in load_scalar_attributes
 |     raise orm_exc.DetachedInstanceError(
 | sqlalchemy.orm.exc.DetachedInstanceError: Instance <DbAuthInfo at 0x7f3982390640> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3)
+-> WARNING at 2023-05-17 00:08:14.491510+02:00
 | maximum attempts 5 of calling do_update, exceeded

Steps to reproduce

Steps to reproduce the behavior:

Only happened when I submit 40 of my pseudopotential workchains, each one will spawn 100 small pw.x calculation. Therefore not easy to reproduce from scratch, but interestingly since in the process list I have many processes is the pausing state after 5 maximum attempts, I can reproduce with and submit 10 of my workchains.

Expected behavior

Your environment

  • Operating system [e.g. Linux]:
  • aiida-core version [e.g. 2.3.0]:

Other relevant software versions, e.g. Postres & RabbitMQ

Additional context

@unkcpz
Copy link
Member Author

unkcpz commented May 17, 2023

Just find I encounter this before #4596 and also reported by @sphuber #1292

EDIT: According to what I reported in #4596, I need to restart not only the daemon but also restart DB services. Anyway it is very annoying issue prevent me from running "real" high-throughputs calculation, I have to using submission control script to make sure no more than 10 workchain run at the same time.

@sphuber sphuber added the type/duplicate close issue when applying duplicate label label May 17, 2023
@sphuber
Copy link
Contributor

sphuber commented May 17, 2023

I am pretty sure you only need to reset the daemon, not the DB service. But I agree, this needs to be fixed. Let's continue discussion in the other issue

@sphuber sphuber closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2023
@unkcpz
Copy link
Member Author

unkcpz commented May 17, 2023

The problem is I do verdi process play -a (after restart daemon and DB service) and all paused processes restarted but throw the same error after a run a while.

@sphuber
Copy link
Contributor

sphuber commented May 17, 2023

You do verdi daemon restart --reset? Also, can you make sure that you don't have any "rogue" daemon processes running. Stop the daemon and then run ps aux | grep verdi and make sure there are no daemon workers running. because if so, they might still be picking up the jobs and if they have the inconsistent session, they will produce the same error again.

@unkcpz
Copy link
Member Author

unkcpz commented May 19, 2023

@sphuber, I encounter it again and restart the daemon clearly, all the processes are back and working fine. Thanks! I guess maybe you are correct I didn't assure the daemon is fully restarted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug type/duplicate close issue when applying duplicate label
Projects
None yet
Development

No branches or pull requests

2 participants