Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config reload fail due to monit socket connection fail #21268

Open
yejianquan opened this issue Dec 24, 2024 · 4 comments
Open

config reload fail due to monit socket connection fail #21268

yejianquan opened this issue Dec 24, 2024 · 4 comments
Assignees
Labels
Issue for 202405 P1 Priority of the issue, lower than P0

Comments

@yejianquan
Copy link
Contributor

Description

Notice there's chance that config reload/ load minigraph fails due to monit socket connection fail.
It roughly noticed around 12/20. Seems to be a timing and flaky issue.
Suspect it's related to sonic-net/sonic-utilities#3682 @abdosi could you please take a look?

Steps to reproduce the issue:

It's noticed on different config reload scenarios:

  1. config load_minigraph --override_config -y
  2. config reload -y -f -l /etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json,/etc/sonic/running_golden_config3.json,/etc/sonic/running_golden_config4.json,/etc/sonic/running_golden_config5.json,/etc/sonic/running_golden_config6.json,/etc/sonic/running_golden_config7.json,/etc/sonic/running_golden_config8.json,/etc/sonic/running_golden_config9.json,/etc/sonic/running_golden_config10.json,/etc/sonic/running_golden_config11.json,/etc/sonic/running_golden_config12.json,/etc/sonic/running_golden_config13.json,/etc/sonic/running_golden_config14.json,/etc/sonic/running_golden_config15.json &>/dev/null
  3. config reload -y -f &>/dev/null

Describe the results you received:

On pr KVM test plans:
https://elastictest.org/scheduler/testplan/6768f5a49e7fdf9b25e4066e?testcase=testbed_q_sonic-elastictest-prod-vmss-E8s-v3_249428_vms-kvm-t1-lag_prepare.log&type=prepare

TASK [execute cli "config load_minigraph --override_config -y" to apply new minigraph] ***
Monday 23 December 2024  05:59:45 +0000 (0:00:00.818)       0:00:21.232 ******* 
fatal: [vlab-03]: FAILED! => {"changed": true, "cmd": "config load_minigraph --override_config -y", "delta": "0:00:48.840209", "end": "2024-12-23 06:00:34.788892", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2024-12-23 05:59:45.948683", "stderr": "Unix socket /var/run/monit.sock connection error -- No such file or directory", "stderr_lines": ["Unix socket /var/run/monit.sock connection error -- No such file or directory"], "stdout": "Acquired lock on /etc/sonic/reload.lock\nDisabling container and routeCheck monitoring ...\nStopping SONiC target ...\nRunning command: /usr/local/bin/sonic-cfggen -H -m -j /etc/sonic/init_cfg.json --write-to-db\nRunning command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment\nRunning command: config qos reload --no-dynamic-buffer --no-delay\nRunning command: /usr/local/bin/sonic-cfggen -d -t /usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/buffers.json.j2,/tmp/cfg_buffer.json -t /usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/qos.json.j2,/tmp/cfg_qos.json -y /etc/sonic/sonic_version.yml\nRunning command: /usr/local/bin/sonic-cfggen -j /tmp/cfg_buffer.json -j /tmp/cfg_qos.json --write-to-db\nRunning command: pfcwd start_default\nRunning command: config override-config-table /etc/sonic/golden_config_db.json\nRemoving configDB overriden table first ...\nOverriding input config to configDB ...\nOverriding completed. No service is restarted.\nRestarting SONiC target ...\nReloading Monit configuration ...\nReinitializing monit daemon\nEnabling container and routeCheck monitoring ...\nReleased lock on /etc/sonic/reload.lock", "stdout_lines": ["Acquired lock on /etc/sonic/reload.lock", "Disabling container and routeCheck monitoring ...", "Stopping SONiC target ...", "Running command: /usr/local/bin/sonic-cfggen -H -m -j /etc/sonic/init_cfg.json --write-to-db", "Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment", "Running command: config qos reload --no-dynamic-buffer --no-delay", "Running command: /usr/local/bin/sonic-cfggen -d -t /usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/buffers.json.j2,/tmp/cfg_buffer.json -t /usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/qos.json.j2,/tmp/cfg_qos.json -y /etc/sonic/sonic_version.yml", "Running command: /usr/local/bin/sonic-cfggen -j /tmp/cfg_buffer.json -j /tmp/cfg_qos.json --write-to-db", "Running command: pfcwd start_default", "Running command: config override-config-table /etc/sonic/golden_config_db.json", "Removing configDB overriden table first ...", "Overriding input config to configDB ...", "Overriding completed. No service is restarted.", "Restarting SONiC target ...", "Reloading Monit configuration ...", "Reinitializing monit daemon", "Enabling container and routeCheck monitoring ...", "Released lock on /etc/sonic/reload.lock"]}
"non-zero return code", "rc": 1, "start": "2024-12-23 05:59:45.948683", "stderr": "Unix socket /var/run/monit.sock connection error -- No such file or directory", "stderr_lines": ["Unix socket /var/run/monit.sock connection error -- No such file or directory"],

In nightly test, we run with &> /dev/null, but we can still see '1' as the RC.
And in syslog, I notice the same error logs:

        if (res.is_failed or 'exception' in res) and not module_ignore_errors:
>           raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
E           tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
E           failed = True
E           changed = True
E           rc = 1
E           cmd = config reload -y -f -l /etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json,/etc/sonic/running_golden_config3.json,/etc/sonic/running_golden_config4.json,/etc/sonic/running_golden_config5.json,/etc/sonic/running_golden_config6.json,/etc/sonic/running_golden_config7.json,/etc/sonic/running_golden_config8.json,/etc/sonic/running_golden_config9.json,/etc/sonic/running_golden_config10.json,/etc/sonic/running_golden_config11.json,/etc/sonic/running_golden_config12.json,/etc/sonic/running_golden_config13.json,/etc/sonic/running_golden_config14.json,/etc/sonic/running_golden_config15.json &>/dev/null
E           start = 2024-12-21 11:59:12.553731
E           end = 2024-12-21 12:01:31.501457
E           delta = 0:02:18.947726
E           msg = non-zero return code
E           invocation = {'module_args': {'executable': '/bin/bash', '_raw_params': 'config reload -y -f -l /etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json,/etc/sonic/running_golden_config3.json,/etc/sonic/running_golden_config4.json,/etc/sonic/running_golden_config5.json,/etc/sonic/running_golden_config6.json,/etc/sonic/running_golden_config7.json,/etc/sonic/running_golden_config8.json,/etc/sonic/running_golden_config9.json,/etc/sonic/running_golden_config10.json,/etc/sonic/running_golden_config11.json,/etc/sonic/running_golden_config12.json,/etc/sonic/running_golden_config13.json,/etc/sonic/running_golden_config14.json,/etc/sonic/running_golden_config15.json &>/dev/null', '_uses_shell': True, 'warn': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'creates': None, 'removes': None, 'stdin': None}}
E           _ansible_no_log = None
E           stdout =
E           stderr =

In syslog, around the fail time:

2024 Dec 21 12:01:31.079457 xx-sup-1 INFO containerd[1029]: time="2024-12-21T12:01:31.078208904Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
2024 Dec 21 12:01:31.079546 xx-sup-1 INFO containerd[1029]: time="2024-12-21T12:01:31.078300819Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
2024 Dec 21 12:01:31.079589 xx-sup-1 INFO containerd[1029]: time="2024-12-21T12:01:31.078323265Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
2024 Dec 21 12:01:31.079625 xx-sup-1 INFO containerd[1029]: time="2024-12-21T12:01:31.078597811Z" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/3f6942e3e7c5cde070975a37a7ad066be50ee69308231e51ddfdcdb93a0d961b pid=1305728 runtime=io.containerd.runc.v2
2024 Dec 21 12:01:31.096709 str3-8800-sup-1 ERR monit[1305720]: Unix socket /var/run/monit.sock connection error -- No such file or directory
2024 Dec 21 12:01:31.267612 xx-sup-1 INFO container: docker cmd: start for radv
2024 Dec 21 12:01:31.269724 xx-sup-1 DEBUG container: container_start: END

Describe the results you expected:

config reload should go smoothly with RC 0

Output of show version:

both internal-202405 and github 202405 image has this issue

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@yejianquan
Copy link
Contributor Author

@abdosi could you please take a look at this?

@yejianquan
Copy link
Contributor Author

@bingwang-ms for viz

@yejianquan yejianquan added the P1 Priority of the issue, lower than P0 label Dec 24, 2024
@yejianquan
Copy link
Contributor Author

@vperumal @anamehra for viz

@yejianquan
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202405 P1 Priority of the issue, lower than P0
Projects
None yet
Development

No branches or pull requests

2 participants