Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beech hung with all relays on relay board energized #306

Open
jessicamillar opened this issue Jan 12, 2025 · 1 comment
Open

beech hung with all relays on relay board energized #306

jessicamillar opened this issue Jan 12, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@jessicamillar
Copy link
Collaborator

jessicamillar commented Jan 12, 2025

beech relays energized 20250112 Date: Jan 12 2025

Biggest issue - beech hung with relays on board energized for 45 minutes

2025-01-12 09:09:58.350 Checking tls_paths_present
2025-01-12 09:09:58.350 Getting requested_names
2025-01-12 09:09:58.351 Loading layout
2025-01-12 09:09:58.391 Getting nodes run by scada
2025-01-12 09:09:58.392 Done
################################
# BELIEVE THIS WAS NOT ACTUALLY 10:02 FROM HERE ON
#######################################
2025-01-12 09:13:25.055 
2025-01-12 09:13:25.056 Env file: </home/pi/gw-scada-spaceheat-python/.env>  exists:True
2025-01-12 09:13:25.056 Settings:

2025-01-12 09:13:25.250 Checking tls_paths_present
2025-01-12 09:13:25.251 Getting requested_names
2025-01-12 09:13:25.251 Loading layout
2025-01-12 09:13:25.293 Getting nodes run by scada
2025-01-12 09:13:25.295 Done


2025-01-12 10:02:12.562 
Subscription info for <hw1.isone.me.versant.keene.beech.scada> [construction]
  Client name: <local>  topic_dst: <s2>

Other issues at beech

Since we are continuing to have problems with the beech power meter, I have moved the power-meter ShNode from primary to secondary scada. The code was in a somewhat broken state on beech2 until 11:30 am.

The beech dashboard is failing often - sometimes with a TLS error and sometimes with core dumps. See related TLS issue in proactor ... This is almost certainly due in part to the dashboard requiring power data (todo: get rid of HpHack in the dashboard) but also looks related to an underlying gwproactor issue. Note that there are essentially no crashes on the oak and fir dashboards.

Various notes and timeline

06:15 am journalctl strange report re starting and deactivating gwspaceheat-restart.service

Jan 12 06:15:03 beech systemd[1]: Starting gwspaceheat-restart.service - Start gwspaceheat service if is not running; Designed to catch manually stopping and forgetting to restart service....
Jan 12 06:15:07 beech python[418550]: 2025-01-12 06:15:07.199 [relay1] sending DeEnergize to multiplexer
Jan 12 06:15:09 beech python[418550]: 2025-01-12 06:15:07.725 [pico-cycler] primary-flow pico_607636 flatlined
Jan 12 06:15:09 beech python[418550]: 2025-01-12 06:15:07.771 [pico-cycler] dist-flow2 pico_2a7e22 flatlined
Jan 12 06:15:12 beech systemd[1]: gwspaceheat-restart.service: Deactivated successfully.
Jan 12 06:15:12 beech systemd[1]: Finished gwspaceheat-restart.service - Start gwspaceheat service if is not running; Designed to catch manually stopping and forgetting to restart service.

06:20 am: First strange Scada restart

2025-01-12 06:20:10.572 ERROR in process_message
Traceback (most recent call last):
  File "/home/pi/gw-scada-spaceheat-python/gw_spaceheat/venv/lib/python3.11/site-packages/gwproactor/proactor_implementation.py", line 362, in process_messages
    await self.process_message(message)
  File "/home/pi/gw-scada-spaceheat-python/gw_spaceheat/venv/lib/python3.11/site-packages/gwproactor/proactor_implementation.py", line 477, in process_message
    self._watchdog.process_message(message)
  File "/home/pi/gw-scada-spaceheat-python/gw_spaceheat/venv/lib/python3.11/site-packages/gwproactor/watchdog.py", line 77, in process_message
    self._pat_external_watchdog()
  File "/home/pi/gw-scada-spaceheat-python/gw_spaceheat/venv/lib/python3.11/site-packages/gwproactor/watchdog.py", line 147, in _pat_external_watchdog
    subprocess.run(self._pat_external_watchdog_process_args, check=True)  # noqa: S603
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pi/.pyenv/versions/3.11.9/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['systemd-notify', '--pid=418550', 'WATCHDOG=1']' returned non-zero exit status 1.

6:20 -> 6:47 am: Scada stuck in startup

2025-01-12 06:20:25.349 Getting nodes run by scada
2025-01-12 06:20:27.394 Done
2025-01-12 06:47:49.758 
@jessicamillar jessicamillar added the bug Something isn't working label Jan 12, 2025
@jessicamillar
Copy link
Collaborator Author

jessicamillar commented Jan 12, 2025

I asked George to stop the scada code as soon as he ssh'd in (I was realizing that I was logged out of tailscale). Attached is the proactor.log code that I got at that point, before restarting the Scada.
beech.up_to_just_after_power_cycle.log.

Also attached are journalctl logs (from journalctl --since "2025-01-12 06:00:00" --until "2025-01-12 11:00:00" > journalctl.log)
jouranlctl.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant