Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stops working randomly #12

Open
peyloride opened this issue Jan 25, 2024 · 16 comments
Open

Stops working randomly #12

peyloride opened this issue Jan 25, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@peyloride
Copy link

peyloride commented Jan 25, 2024

Running Archlinux + Hyprland.

I'm using bundled awatcher and starting it like:

exec-once = telegram-desktop & awatcher

It starts well and runs at first, then randomly it crashes. I don't have a log or something unfortunately, if you can guide me how to get it I can provide that.

Example:
image

as you can see I was not AFK and was on meet but all of a sudden awatcher stops recording. Weird thing is the server works all the time, I can access localhost:5600 all the time.

This is an example of random day, it happens almost every day.

@2e3s
Copy link
Owner

2e3s commented Jan 25, 2024

Hello! Maybe try to run this manually with logging awatcher -vvv and see if it eventually fails as well? Hunting for logs from exec-once may be unnecessarily challenging.

If it works with a manual run and the problem exists only within exec-once, show the lines from wherever exec-once gets it posted:

exec-once = awatcher -vvv

Your system is probably using systemd, so this might work:
journalctl --user --no-pager --since "1 day ago" | grep awatcher
or this
sudo grep -i awatcher /var/log/syslog
If it doesn't post logs in your setup, you can make it a service like described in #11

as you can see I was not AFK and was on meet but all of a sudden awatcher stops recording

It posted "afk" which means the program didn't crash. So there was a problem getting the window data, maybe Hyprland didn't return it in some conditions. -vvv has enough level of debug output to understand what's going on, I guess.

@jopereira
Copy link

A also see this happening, with both standalone and bundled versions. I noticed the following:

  • it usually happens when the machine is suspended/resumed
  • at the same time, I stop getting messages like these:

[2024-02-12 08:28:02.469725 DEBUG watchers::watchers::gnome_window] Changed window app_id=...

  • and start geting messages like these:

[2024-02-12 09:15:17.336292 WARN aw_transform::flood] Gap was of negative duration and could NOT be safely merged (-PT183.376476271Ss). This warning will only show once per batch.
[2024-02-12 09:15:17.336310 WARN aw_transform::flood] Gap was of negative duration (-PT388.971761192Ss), but could be safely merged. This error will only show once per batch.

@luisgerhorst
Copy link
Contributor

luisgerhorst commented Mar 13, 2024

I can confirm what @jopereira reports:

$ journalctl --user -u aw-awatcher
<snip>
Mar 13 08:58:30 luis-tp bash[4290]: [2024-03-13 07:58:30.344889 DEBUG watchers::watchers::gnome_window] Changed window app_id="Emacs", title="..."
Mar 13 08:58:30 luis-tp bash[4290]: [2024-03-13 07:58:30.344931 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:30 luis-tp bash[4290]: [2024-03-13 07:58:30.677704 TRACE watchers::watchers::idle] Reporting as not idle
Mar 13 08:58:31 luis-tp bash[4290]: [2024-03-13 07:58:31.375662 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:32 luis-tp bash[4290]: [2024-03-13 07:58:32.416422 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:33 luis-tp bash[4290]: [2024-03-13 07:58:33.459502 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:35 luis-tp bash[4290]: [2024-03-13 07:58:35.700555 TRACE watchers::watchers::idle] Reporting as not idle
Mar 13 08:58:40 luis-tp bash[4290]: [2024-03-13 07:58:40.720520 TRACE watchers::watchers::idle] Reporting as not idle
Mar 13 08:58:45 luis-tp bash[4290]: [2024-03-13 07:58:45.743974 TRACE watchers::watchers::idle] Reporting as not idle
<snip>
$ journalctl --user -u aw-server
<snip>
Mar 12 17:56:00 luis-tp bash[4289]: 2024-03-12 17:56:00 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-emacs_luis-tp)  (aw_server.api:279)
Mar 12 18:14:46 luis-tp bash[4289]: 2024-03-12 18:14:46 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-window_luis-tp)  (aw_server.api:279)
Mar 12 18:14:51 luis-tp bash[4289]: 2024-03-12 18:14:51 [WARNING]: Gap was of negative duration and could NOT be safely merged (-181.052s). This warning will only show once per batch.  (aw_transform.flood:54)
Mar 12 18:14:51 luis-tp bash[4289]: 2024-03-12 18:14:51 [WARNING]: Gap was of negative duration but could be safely merged (-1204.95s). This message will only show once per batch.  (aw_transform.flood:46)
<snip>
Mar 12 18:15:16 luis-tp bash[4289]: 2024-03-12 18:15:16 [WARNING]: Gap was of negative duration and could NOT be safely merged (-184.073s). This warning will only show once per batch.  (aw_transform.flood:54)
Mar 13 08:49:30 luis-tp bash[4289]: 2024-03-13 08:49:30 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-web-firefox)  (aw_server.api:279)
Mar 13 08:49:32 luis-tp bash[4289]: 2024-03-13 08:49:32 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-afk_luis-tp)  (aw_server.api:279)
Mar 13 08:52:32 luis-tp bash[4289]: 2024-03-13 08:52:32 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-window_luis-tp)  (aw_server.api:279)
Mar 13 08:59:15 luis-tp bash[4289]: 2024-03-13 08:59:15 [WARNING]: Gap was of negative duration but could be safely merged (-15.748s). This message will only show once per batch.  (aw_transform.flood:46)
Mar 13 08:59:15 luis-tp bash[4289]: 2024-03-13 08:59:15 [WARNING]: Gap was of negative duration and could NOT be safely merged (-181.011s). This warning will only show once per batch.  (aw_transform.flood:54)

Restarting aw-awatcher resolves it. When using systemd, here's a dirty workaround to limit the impact of the issue (restart awatcher every hour unconditionally):

[Service]
Type=simple
# Verbosity level: -v for warnings, -vv for info, -vvv for debug, -vvvv for trace
ExecStart=/bin/bash -c 'aw-awatcher -vvvv'
Restart=always
RuntimeMaxSec=1h

@2e3s
Copy link
Owner

2e3s commented Mar 16, 2024

it usually happens when the machine is suspended/resumed

I'm reading your reports here, but I can't reproduce it, unfortunately. Please, add more information about your environment and how you run the program.

@luisgerhorst
Copy link
Contributor

Here you go:

Hardware Information:

  • Hardware Model: Lenovo ThinkPad X1 Carbon Gen 9
  • Processor: 11th Gen Intel® Core™ i7-1185G7 × 8
  • Graphics: Intel® Xe Graphics (TGL GT2)

Software Information:

  • Firmware Version: N32ET90W (1.66 )
  • OS Name: Fedora Linux 39 (Workstation Edition)
  • GNOME Version: 45.4
  • Windowing System: Wayland
  • Kernel Version: Linux 6.7.9-200.fc39.x86_64

In extensions, I see a 6 behind the name of https://github.com/flexagoon/focused-window-dbus, not sure if that is the version.

> aw-awatcher --version
Activity Watcher 0.2.4
> cat ~/.config/systemd/user/aw-awatcher.service 
[Unit]
Requires=aw-server.service
After=aw-server.service

[Service]
Type=simple
# Verbosity level: -v for warnings, -vv for info, -vvv for debug, -vvvv for trace
ExecStart=/bin/bash -c 'aw-awatcher -vvvv'
Restart=always
RestartSec=60
RuntimeMaxSec=1h
Nice=18
IOSchedulingPriority=6
# Was not able to test this on a system where it has an effect:
# CPUQuota=10%

[Install]
WantedBy=default.target
> cat ~/.config/systemd/user/aw-server.service  
[Unit]
After=network.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'aw-server'
Restart=on-failure
RestartSec=60
Nice=18
IOSchedulingPriority=6
# Was not able to test this on a system where it has an effect:
# CPUQuota=10%

[Install]
WantedBy=default.target

I also get vastly wrong usage times in the overview of the web interface.

@luisgerhorst
Copy link
Contributor

luisgerhorst commented May 3, 2024

Just wanted to let you know that I've now built awatcher from source (all worked like a charm) and disabled some of the other watchers. I've now been running that debug build for 2 weeks and have not been able to reproduce the issue since. I will continue to debug and try to pinpoint it.

Here's the changes I made: luisgerhorst@26d2f2e - However, it could also be just a coincidence at this point...

@2e3s
Copy link
Owner

2e3s commented May 3, 2024

@luisgerhorst thanks, I'm looking at your changes and I don't see anything that could prevent the aforementioned symptoms, so maybe a coincidence indeed. Having it consistently reproduced would certainly help solving it once and for all.

@2e3s 2e3s added the bug Something isn't working label May 8, 2024
@jopereira
Copy link

I've been using @luisgerhorst's fork since May 3, and the problem has not happened since then.

@2e3s
Copy link
Owner

2e3s commented May 16, 2024

@luisgerhorst why could it be? E.g. why have you added timeouts in a couple places, and removed the reconnect attempts for Gnome watchers?
I see that your changes only affect the watcher loading/initializing but I've got an impression that the failures happen long after loading, during normal functioning.

@zbw8388
Copy link

zbw8388 commented May 19, 2024

I also noticed this issue and there seems to be two issues at play:

  1. Watcher fails to return from watcher.run_iteration (fixed by timeout)
  2. For wayland apps, after suspending for a relatively long period of time, awatcher starts to raise [2024-05-19 22:04:58.559502 �[31mERROR�[0m watchers::watchers] Error on active window iteration: GetPropertyReply[_NET_WM_NAME] failed (should be fixed by commenting out X11 window watcher but have yet to test it)

@luisgerhorst
Copy link
Contributor

I have now cleaned up my initial mess (luisgerhorst@7af01ca), and will cont. to bisect it. Just takes some time as I need multiple days to be sure it's still fixed for each version.

@luisgerhorst
Copy link
Contributor

luisgerhorst commented May 21, 2024

@zbw8388 wrote:

I also noticed this issue and there seems to be two issues at play:

1. Watcher fails to return from `watcher.run_iteration` (fixed by timeout)

If this is true, maybe we should further bisect what in run_iteration fails to return?

2. For wayland apps, after suspending for a relatively long period of time, awatcher starts to raise `[2024-05-19 22:04:58.559502 �[31mERROR�[0m watchers::watchers] Error on active window iteration: GetPropertyReply[_NET_WM_NAME] failed` (should be fixed by commenting out X11 window watcher but have yet to test it)

Great!

@2e3s
Copy link
Owner

2e3s commented May 21, 2024

If this is true, maybe we should further bisect what in run_iteration fails to return?

If you use Gnome and given the symptoms, it means that DBus call to org.gnome.shell.extensions.FocusedWindow hangs at some moment. There is only that call (and reporting to AW server).

timeout(watcher_type.sleep_time(&client.config), watcher.run_iteration(&client)).await

@luisgerhorst Your change looks good and universal to hopefully exclude similar problems in this and other watchers. It's simple and free to try anyway. I can commit the change in a few days for 0.2.7, or preferably merge your PR if you push it (with other watchers enabled). Thanks regardless ❤️

@2e3s
Copy link
Owner

2e3s commented May 24, 2024

I have released the possible fix for the perceived Gnome issue https://github.com/2e3s/awatcher/releases/tag/v0.2.7

For wayland apps, after suspending for a relatively long period of time, awatcher starts to raise [2024-05-19 22:04:58.559502 �[31mERROR�[0m watchers::watchers] Error on active window iteration: GetPropertyReply[_NET_WM_NAME] failed (should be fixed by commenting out X11 window watcher but have yet to test it)

@zbw8388 I think this is another possible problem, which is related to X11 (otherwise X11 window watcher wouldn't start). Not sure what you mean by wayland apps though. If you'll notice such a buggy behaviour on the newest version, please, open an issue with a description.

@luisgerhorst
Copy link
Contributor

Thanks! I'm testing it and will report back if I observe any issues.

@42willow
Copy link

Issue doesn't occur for me on v0.2.7 of awatcher (installed as module) with hyprland v0.40.0-1
image
(my computer was suspended at the blank point in the middle)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants