Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: show system boots in metrics #21444

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

jelly
Copy link
Member

@jelly jelly commented Dec 17, 2024

Show a boot as an metric event in the historical metrics overview. A boot is likely to cause a high CPU/memory spikes so it is interesting for a system administrator to be aware of them. We obtain the boot information from systemd as last is deprecated and not all distros use lastlog2 while journalctl is freely available.

Closes: #15983


Show system boot in metrics

image

@jelly jelly added release-note no-test For doc/workflow changes, or experiments which don't need a full CI run, labels Dec 17, 2024
@jelly jelly removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Dec 17, 2024
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some small stuff, but by and large looks good. I don't have an off-hand idea why it doesn't work on Ubuntu. Does that reproduce locally or is it a race?

The arch failure is weird, though:

wait_js_cond(ph_in_text(".metrics-minute[data-minute='35']","Boot")): Error: actual text: 10:35 AMBoot

Like what now? The text is right there!?

pkg/metrics/metrics.jsx Show resolved Hide resolved
pkg/metrics/metrics.jsx Outdated Show resolved Hide resolved
pkg/metrics/metrics.jsx Outdated Show resolved Hide resolved
@jelly
Copy link
Member Author

jelly commented Dec 18, 2024

Thanks! Some small stuff, but by and large looks good. I don't have an off-hand idea why it doesn't work on Ubuntu. Does that reproduce locally or is it a race?

Will check locally but so far wasn't able to reproduce any of this locally.

The arch failure is weird, though:

wait_js_cond(ph_in_text(".metrics-minute[data-minute='35']","Boot")): Error: actual text: 10:35 AMBoot

Like what now? The text is right there!?

Right, can't reproduce this, so assume it is a race.

@jelly jelly marked this pull request as ready for review December 18, 2024 13:22
@@ -1621,8 +1639,24 @@ class MetricsHistory extends React.Component {
} catch (_ex) {}

const isBeibootBridge = cmdline?.includes("ic# cockpit-bridge");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops! Can fix in a repush

@jelly
Copy link
Member Author

jelly commented Dec 18, 2024

On Ubuntu the journal doesn't seem to get picked up?

root@m1:~# journalctl --list-boots
IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
 -1 3a67d55d86ea41549ac55950991222ae Wed 2024-12-18 15:17:51 UTC Mon 2021-03-08 11:45:01 UTC
  0 1776c19e4e9f48b99158f0a265b864c4 Wed 2024-12-18 15:15:37 UTC Wed 2024-12-18 15:15:40 UTC
root@m1:~# date
Mon Mar  8 11:50:41 UTC 2021

Versus for example Arch:

[root@m1 ~]# journalctl --list-boots
IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
 -1 d1b4578c55c1435f9f5aac81889f1999 Mon 2021-03-08 10:35:47 UTC Mon 2021-03-08 10:41:54 UTC
  0 bfd5c2d9a2b04503a636041533cd59fb Wed 2024-12-18 15:59:46 UTC Mon 2021-03-08 11:08:55 UTC

The journal entries for Ubuntu:

root@m1:~# ls -lhtr /var/log/journal/ff95f672a3504a8499f522693185203b/
total 47M
-rw-r-----+ 1 root systemd-journal 8.0M Aug 17  2020 system@6df91f9c32f847cfac05aea8d7ca1e78-0000000000000f75-0006298ceacd7a09.journal
-rw-r-----+ 1 root systemd-journal 8.0M Sep 19 21:24 system@6df91f9c32f847cfac05aea8d7ca1e78-0000000000001449-0005ad132fb62039.journal
-rw-r-----+ 1 root systemd-journal 8.0M Sep 19 21:24 user-1000@6df91f9c32f847cfac05aea8d7ca1e78-0000000000001474-0005ad1330065d91.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 10:48 system@6df91f9c32f847cfac05aea8d7ca1e78-0000000000001552-0005afb13e5e8147.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 10:48 user-1000@6df91f9c32f847cfac05aea8d7ca1e78-0000000000001589-0005afb13e8531f5.journal
-rw-r--r--+ 1 root systemd-journal 8.0M Mar  8 10:48 journal.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 10:57 user-1000.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 11:50 system.journal
-rw-r-----+ 1 root systemd-journal 8.0M Dec 18  2024 system@863a07c5a3224916b55281f5f2a9a169-0000000000000ed2-0006298ce2d21091.journal
root@m1:~# uname -a

For arch:

[root@m1 ~]# ls -lhtr /var/log/journal/65ec27b2fd354a538994e2579401d760/
total 43M
-rw-r-----+ 1 root systemd-journal 8.0M Aug 17  2020 system@1472025ac294433f914c31fb19ba7d3e-0000000000000001-0006298d80ae2b4a.journal
-rw-r-----+ 1 root systemd-journal 8.0M Sep 19 21:24 system@1472025ac294433f914c31fb19ba7d3e-00000000000004ef-0005ad132fb61893.journal
-rw-r-----+ 1 root systemd-journal 8.0M Sep 19 21:24 user-1000@1472025ac294433f914c31fb19ba7d3e-000000000000059a-0005ad132fd40c98.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 10:48 system@1472025ac294433f914c31fb19ba7d3e-0000000000000732-0005afb13e5e6a9d.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 10:48 user-1000@1472025ac294433f914c31fb19ba7d3e-00000000000007cc-0005afb13e7dca99.journal
-rw-r--r--+ 1 root systemd-journal 8.0M Mar  8 10:48 journal.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 10:52 user-1000.journal
-rw-r-----+ 1 root systemd-journal 8.0M Mar  8 11:08 system.journal

Show a boot as an metric event in the historical metrics overview. A
boot is likely to cause a high CPU/memory spikes so it is interesting
for a system administrator to be aware of them. We obtain the boot
information from systemd as `last` is deprecated and not all distros use
lastlog2, while `journalctl` is available everywhere.

Fixes cockpit-project#15983
@martinpitt
Copy link
Member

So on Ubuntu, even if I remove all other system*.journal files and rename journal.journal to system.journal, the list-boots list is still wrong:

 0 a6fec5b7e903464e9899394e18e6f49b Mon 2020-08-17 14:00:03 UTC Mon 2021-03-08 10:52:01 UTC

journalctl indeed starts with "Aug 17" to "Sep 19" and then continues with a boot on March 8, which is utterly wrong. These come from the user@ logs.

@martinpitt
Copy link
Member

I pushed this rather simple diff (not quite simple to come up with it, but wrt. review):

--- test/verify/check-metrics
+++ test/verify/check-metrics
@@ -440,7 +440,10 @@ class TestHistoryMetrics(testlib.MachineCase):
             return
 
         m.upload(["verify/files/metrics-archives/journal.journal.gz"], "/tmp")
+        # we need to move all other existing journals out of the way, otherwise boot order is going back in time
         m.execute("""gunzip /tmp/journal.journal.gz
+                     systemctl stop systemd-journald
+                     rm /var/log/journal/*/*.journal
                      cp /tmp/journal.journal /var/log/journal/*/""")
         b.reload()
         b.enter_page("/metrics")

This works for me locally. Let the bots have their deliberations!

Comment on lines +1657 to +1658
} catch (exc) {
console.warn("journalctl --list-boots failed", exc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 added lines are not executed by any test.

@martinpitt martinpitt merged commit 11728bd into cockpit-project:main Dec 19, 2024
85 checks passed
@jelly jelly deleted the metrics-reboots branch December 21, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

metrics: History doesn't include reboots, but should
3 participants