-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(input.intel_pmt): Handle telem devices without numa_node attribute #13977
Conversation
When the intel_pmt input plugin is enabled currently, it will cause telegraf to fail on startup like so: ``` telegraf[47878]: 2023-09-21T22:27:09Z E! [telegraf] Error running agent: could not initialize input inputs.intel_pmt: error while exploring pmt sysfs: error while evaluating symlink "/sys/class/intel_pmt/telem0/numa_node": lstat /sys/devices/pci0000:00/0000:00:0a.0/intel_vsec.telemetry.0/intel_pmt/telem0/numa_node: no such file or directory ``` While intel_pmt telem devices do not have a numa_node attribute, their parent intel_vsec devices do. For example the current behavior: ``` $ ls -l /sys/class/intel_pmt/telem0/device/numa_node ls: cannot access '/sys/class/intel_pmt/telem0/device/numa_node': No such file or directory ``` Versus traversing up to intel_vsec device: ``` $ ls -l /sys/class/intel_pmt/telem0/device/../numa_node -rw-r--r--. 1 root root 4096 Sep 18 14:13 /sys/class/intel_pmt/telem0/device/../numa_node ``` Thus update explorePmtInSysfs() to traverse up to the intel_vsec device to find numa_node. Note, filepath.Join() will interpret the `..` ahead of the filepath.EvalSymlinks, thus we evalSymlinks `/sys/class/intel_pmt/telem0/device` and then use filepath.Join() to traverse up. Tested on Fedora 38's 6.4.14-200.fc38.x86_64 kernel and OpenSUSE's 15.4's 5.14.21-150400.24.55-default.
Thanks so much for the pull request! |
!signed-cla |
@jakubsikorski it would be good to have you review this if possible. |
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 👍 This pull request doesn't change the Telegraf binary size 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ben,
It seems different versions of the kernel behave differently here. This depends on the pmt_telemetry
/intel_vsec
device driver.
If the driver is bus/auxiliary
: there won't be a numa_node
file and it will fail, just as you described.
If the driver is bus/platform
: numa_node
file will be present and it will work as expected.
Traversing up the driver will become bus/pci
. It could be intel_vsec
as you described in newer versions of the kernel or intel_pmt
, but it will be a pci driver, which should always have a numa_node
file, even if the node is unknown (value is -1 then).
So the changes you propose will indeed fix that.
Thanks for finding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p-zak @jakubsikorski thank you for reviewing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
When the intel_pmt input plugin is enabled currently, it will cause telegraf to fail on startup like so:
While intel_pmt telem devices do not have a numa_node attribute, their parent intel_vsec devices do. For example the current behavior:
Versus traversing up to intel_vsec device:
Thus update explorePmtInSysfs() to traverse up to the intel_vsec device to find numa_node.
Note, filepath.Join() will interpret the
..
ahead of the filepath.EvalSymlinks, thus we evalSymlinks/sys/class/intel_pmt/telem0/device
and then use filepath.Join() to traverse up.Tested on Fedora 38's 6.4.14-200.fc38.x86_64 kernel and OpenSUSE's 15.4's 5.14.21-150400.24.55-default.
resolves #13976