Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log spam from orchagent supervisord #21157

Open
mint570 opened this issue Dec 13, 2024 · 8 comments
Open

Log spam from orchagent supervisord #21157

mint570 opened this issue Dec 13, 2024 · 8 comments
Assignees
Labels
MSFT Triaged this issue has been triaged

Comments

@mint570
Copy link
Contributor

mint570 commented Dec 13, 2024

Description

The following logs spam in syslog about every 10 seconds:

swss#supervisord: orchagent
swss#supervisord: RESULT 2
swss#supervisord: OKREADY
swss#supervisord: orchagent
swss#supervisord: RESULT 2
swss#supervisord: OKREADY
swss#supervisord: orchagent
swss#supervisord: RESULT 2
swss#supervisord: OKREADY

This is due to the "stdout_capture_maxbytes=1MB" supervisord config in orchagent introduced by the watchdog change: #15429

For devices with limited storage, this fills up the log faster than necessary.
Is there a way to implement the watchdog feature without printing the "heart beat" messages?

Steps to reproduce the issue:

Check /var/log/syslog file

Describe the results you received:

N/A

Describe the results you expected:

N/A

@tjchadaga
Copy link
Contributor

@liuh-80 - please help take a look

@liuh-80
Copy link
Contributor

liuh-80 commented Dec 23, 2024

stdout_capture_maxbytes will not generate new syslog, it just will capture process stdout and filter heartbeat message

I checked the PR https://github.com/sonic-net/sonic-buildimage/pull/15429/files#diff-36bcb1718eaced89b21a75ef897cd156ba095e04b798915579ac758630007e62

Not found any code print these log

@liuh-80
Copy link
Contributor

liuh-80 commented Dec 23, 2024

Will check where the code prints these logs and remove them, these are not part of heartbeat message:
swss#supervisord: orchagent
swss#supervisord: RESULT 2
swss#supervisord: OKREADY

@liuh-80
Copy link
Contributor

liuh-80 commented Dec 23, 2024

Test with a POC image:
https://github.com/sonic-net/sonic-buildimage/pull/21265/checks?check_run_id=34785973518

After remove stdout_capture_maxbytes, there are following heartbeat message:

2024 Dec 23 12:48:13.849809 vlab-01 INFO swss#supervisord: orchagent heartbeat
2024 Dec 23 12:53:08.024753 vlab-01 INFO swss#supervisord: message repeated 28 times: [ orchagent heartbeat]

So stdout_capture_maxbytes does not generate any new log, it just cache stdout for filter.
And I guess any process managed by systemd will redirect their stdout to syslog.

After enable stdout_capture_maxbytes, the message between and will be captured/removed from stdout and convert to systemd message and send to proc-exit-listener. so there is only follow log left in syslog:

swss#supervisord: orchagent

For other logs, after check both latest master branch and POC image, I can't find following log:
swss#supervisord: RESULT 2
swss#supervisord: OKREADY

@liuh-80
Copy link
Contributor

liuh-80 commented Dec 23, 2024

@mint570 , can you share me the image version you get following logs, I can't find it in latest master image:
swss#supervisord: RESULT 2
swss#supervisord: OKREADY

@liuh-80
Copy link
Contributor

liuh-80 commented Dec 23, 2024

Also by remove following config, we can mitigate this issue, but I think this is by design:

[program:orchagent]
command=/usr/bin/orchagent.sh
priority=4
autostart=false
autorestart=false
stdout_logfile=NONE
stdout_syslog=true <== systemd write orchagent stdout to syslog.
stderr_logfile=NONE

@mint570
Copy link
Contributor Author

mint570 commented Dec 23, 2024

We used a custom image based on 202305 branch.

We cannot remove the "stdout_syslog=true" config as it is by design.

@liuh-80
Copy link
Contributor

liuh-80 commented Dec 24, 2024

@mint570 , I checked with latest 202305 branch image, can't find following logs, please check your custom image to make sure it's not cause by code change in your custom image:

swss#supervisord: RESULT 2
swss#supervisord: OKREADY

Here is the image I use:

SONiC Software Version: SONiC.202305.728811-7b88a427b
SONiC OS Version: 11
Distribution: Debian 11.8
Kernel: 5.10.0-23-2-amd64
Build commit: 7b88a42
Build date: Mon Dec 23 12:16:49 UTC 2024
Built by: azureuser@88ac6c70c000002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MSFT Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

3 participants