Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaps in Graphs for Time Ranges Beyond 1 Hour (12h, 24h, etc.) #419

Open
Korveck14 opened this issue Jan 17, 2025 · 7 comments
Open

Gaps in Graphs for Time Ranges Beyond 1 Hour (12h, 24h, etc.) #419

Korveck14 opened this issue Jan 17, 2025 · 7 comments
Labels
troubleshooting Maybe bug, maybe not

Comments

@Korveck14
Copy link

I’ve noticed gaps in the graphs when viewing data over extended time ranges such as 12 hours, 24 hours, days, or weeks. However, when viewing data for shorter time periods (e.g., 1 hour), the graphs display correctly without any gaps.

Example Screenshots:
12h View with Gaps:
Image

1h View without Gaps:
Image

I checked the agent logs for the times corresponding to the gaps, and there is no indication of the agent being down or disconnected. The logs confirm that data was collected during those periods.

This issue seems specific to how the data is visualized for extended time ranges.

Is this a known issue with how data is aggregated or displayed for longer time ranges?
Could this be related to the database or data retention settings?
Are there any troubleshooting steps I should try to resolve this?

Thank you for looking into this!

@henrygd
Copy link
Owner

henrygd commented Jan 17, 2025

That's very strange!

The 12 hour chart uses 10m records that are created by averaging the 1m records every ten minutes. There must be at least nine 1m records created in the preceding ten minutes or it will be skipped.

The 1 hour chart does have some margin for error built in where it will only show a gap if the time between 1m records is longer than 90 seconds IIRC.

Can you please check the system_stats table in PocketBase to make sure the 1m records are being created exactly one minute apart with very little drift? Search for a specific system and record type like this:

system.name = 'example' && (type = '1m' || type = '10m')

For every ten minute block, there should be a 10m record created if there are at least 9 1m records since the last 10m record creation. If the 1m record creation is drifting then maybe there are not enough 1m records being created.

@henrygd henrygd added the troubleshooting Maybe bug, maybe not label Jan 17, 2025
@Korveck14
Copy link
Author

Korveck14 commented Jan 17, 2025

Based on the system_stats, it seems that 10-minute records are occasionally not being created. Here's an example:

Image

Between 18:50 UTC and 19:10 UTC, the 10-minute record for 18:50 and 19:10 are present, but the one for 19:00 is missing. This missing record is also reflected in the graph (note that the graph is displayed in local time, which is one hour ahead).

Image

Could this issue be related to drift in the 1-minute records?

@henrygd
Copy link
Owner

henrygd commented Jan 17, 2025

Yes, normally the 10m records should be created at exactly :00, :10, etc., and there should not be so much drift in the creation time of the 1m records.

Are you running the hub with Docker, and do you have a TZ env var set by any chance?

If you have more than one system being monitored, are you seeing the same behavior with all systems?

Also, can you run docker version on the agent system and paste the output here please?

Here's the what my personal instance looks like:

Image

@Korveck14
Copy link
Author

Korveck14 commented Jan 17, 2025

This is my configuration in docker-compose:

 beszel:
    image: henrygd/beszel
    container_name: beszel
    volumes:
      - /mnt/docker/config/beszel:/beszel_data
    restart: unless-stopped
    labels:
      tsdproxy.enable: "true"

  beszel-agent:
    image: henrygd/beszel-agent
    container_name: beszel-agent
    network_mode: host
    environment:
      PORT: 45876
      KEY: "ssh-ed25519 XXXXXX"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    restart: unless-stopped

And I only have one system agent, all together with the hub in my raspberry pi 4.

Docker version is the following:

Client:
 Version:           20.10.24+dfsg1
 API version:       1.41
 Go version:        go1.19.8
 Git commit:        297e128
 Built:             Sat Oct 12 15:19:49 2024
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.24+dfsg1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.8
  Git commit:       5d6db84
  Built:            Sat Oct 12 15:19:49 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.20~ds1
  GitCommit:        1.6.20~ds1-1+b1
 runc:
  Version:          1.1.5+ds1
  GitCommit:        1.1.5+ds1-1+deb12u1
 docker-init:
  Version:          0.19.0
  GitCommit:

Could the drift happen because the lack of an RTC in my rpi or maybe its just not powerful enough?

@henrygd
Copy link
Owner

henrygd commented Jan 17, 2025

Pi 4 should easily be powerful enough, but you could try testing the hub on a different machine to see if the problem continues.

RTC is an interesting theory. I'm not very knowledgeable on that, but I'd assume it should be fine as long as NTP works.

I'd recommend upgrading Docker if possible. The oldest version I've tested is 24, because Synology container manager forces people to use that, and had to add a workaround to get it working. Docker 20 may have its own issues.

@Korveck14
Copy link
Author

After some search in the internet about time drift on raspberry pi I applied the force_turbo=1 fix suggested in the https://forums.raspberrypi.com/viewtopic.php?t=337797, the problem seems to be resolved.

So far, there has been no time drift, and the gaps in the graphs have disappeared for the past six hours. I’ll continue monitoring to see if the issue reappears, but for now, this solution looks promising even it is not the best one since this will force the CPU to run at maximum speed, which will affect power consumption and temperature.

Image

@henrygd
Copy link
Owner

henrygd commented Jan 18, 2025

Wow, nice job figuring that out!

Unfortunate solution, but at least you know the problem.

Maybe you could try auto-cpufreq and see if any configuration options fix the clock without needing to force turbo.

Otherwise, this should only affect the hub, so moving the hub to a different system would be an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
troubleshooting Maybe bug, maybe not
Projects
None yet
Development

No branches or pull requests

2 participants