-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRITICAL BUG] Grafana Promtail - CRITICAL issue - initial scrape config causes 100% CPU- and Memory load #11398
Comments
@JStickler, `scrape_configs:
As described in the description for initial config it would be feasible to point the initial config to a dedicated existing file (e.g. /var/log/messages) or a non-existing file (e.g. /var/log/file.log). please create a new rpm-package with one of these options. This would fix the issue. |
@janfickler I'm a technical writer, not a developer. But I'll see if anyone on the team has time to look at this issue. |
@janfickler thanks for reporting the issue. Unfortunate it's impacting your production systems. May I know what is the amount of files we are talking about here in
I'm failing to understand what do you mean here. Can you elaborate? Thanks |
@kavirajk, and as i said this affecting all Systems RPM / Deb based, because the folder /var/log/ is never empty and initially it should point to a dedicated file or a not existing file. This affects not only our systems, it should affect all customers. I tested that and with initial install systems becomes directly overloaded (CPU 100% / Memory 100%). the solution would be easy, change in config.yml `scrape_configs:
to: `scrape_configs:
--> /var/log/*log - to - /var/log/file.log or /var/log/messages or /var/log/kern.log and everything should be fine and out of that you can then create the new RPM / DEB Package. Also checked Plattform - kvm, VMWare, vSphere or Azure based vms - all Had the same issue with OS Ubuntu, CentOS 7 / 8, Almalinux 8 / 9, etc. btw. i tested afterwards with an own config pointing to separate dedicated files and this works without problems. |
thanks for the explanation @janfickler. It sounds very reasonable for initial config not to overload the system at the startup. Now we need to figure out what would be the right file or files (instead of Also this made me to think, problem can still happen if user deliberately added So there is an opportunities for long-term to add some kind of upper bound on amount of files(can also be based on size of the files) promtail can scrape at once without overloading the complete system. Currently it starts tailing all the matched files at once, and each file is handled by separate goroutine. My proposal is following,
with (2) we also have to make sure, this change is available in Grafana agent as well. |
I think you could try file like /var/log/messages or something that is at initial OS-Installation everytime on the systems. RPM- and DEB-based systems has normally the same files in /var/log/ from my experience. Just another information to that. It could be that the pattern /var/log/*log is also matching for this File and causes the Problems in addition. I think for now the Option 1 would be a good fix as workaround. For Option 2 i think this is a good way to make it make bulletproof, but i guess it should be tested deeper If Not only the issue depends on Files with binary content and maybe If promtail detects binary content it should ignore it. But i suggest this should be developed in Long Term. |
Related issue: #11398 This minimal config scrape only single file thus not overloading the systems as described in the issue Signed-off-by: Kaviraj <[email protected]>
@kavirajk, i guess you are in holiday :-) |
any update @kavirajk |
@JStickler is there anyway to push that a little bit ? |
@janfickler end of December is when a lot of the team takes vacation because if they don't use their time off, they lose it. So it's understandable that not much has happened in the past two or three weeks. But people are coming back to work, and I see that @kavirajk has already commented on the PR that he's taking a look at it. |
Related issue: #11398 This minimal config scrape only single file thus not overloading the systems as described in the issue
…ackaging. (#11676) Backport 86f2001 from #11511 --- **What this PR does / why we need it**: Related issue: #11398 This minimal config scrape only single file thus not overloading the systems as described in the issue **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](0d4416a) Co-authored-by: Kaviraj Kanagaraj <[email protected]> Co-authored-by: Poyzan <[email protected]>
@poyzannur / @kavirajk - there is still no new version 2.9.4 released ? |
@poyzannur can confirm, that rpm-/deb-package is now available, thx a lot :-) https://github.com/grafana/loki/releases/tag/v2.9.4 `Resolving Dependencies Dependencies Resolved ==================================================================================================================================================================================================================
|
thx for the support guys :-) |
…na#11511) Related issue: grafana#11398 This minimal config scrape only single file thus not overloading the systems as described in the issue
Describe the bug
The RPM-Packages for "promtail" from the official Grafana OSS repository including an initial scrape config which causes that CPU- and Memory-Usage going to 100% immeditially after rpm-installation and automatic service start (system behaviour).
This makes the systems completely unmanagable. SSH is working, but under this pressure not usable.
effected configuration - /etc/promtail/config.yml
`server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
scrape_configs:
static_configs:
labels:
job: varlogs
path: /var/log/*log`
after service is stopped in any way and the config is deleted or changed, then the application is working properly, but this initial config looks into every file under "/var/log/*" which brings systems under enormous pressure.
To Reproduce
(was tested on CentOS 7 Systems and AlmaLinux 8 Systems, so all RedHat systems or all systems which using rpm-packages are affected)
Expected behavior
Possible Solution
effected Systems
criticality
The text was updated successfully, but these errors were encountered: