Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bookworm container upgrade for lldp, net-snmp #18150

Merged
merged 23 commits into from
Apr 22, 2024

Conversation

mohan-selvaraj
Copy link
Contributor

@mohan-selvaraj mohan-selvaraj commented Feb 21, 2024

Why I did it

Update LLDP, net-snmp containers to Bookworm

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Copy link

linux-foundation-easycla bot commented Feb 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@saiarcot895
Copy link
Contributor

The PR build is failing because of build failures in net-snmp. Please look into these failures. The package may need to be updated to 5.9.3+dfsg-2 (similar to the upgrade done for Bullseye).

@mohan-selvaraj mohan-selvaraj changed the title bookworm container upgrade for lldp bookworm container upgrade for lldp, net-snmp Feb 27, 2024
@mohan-selvaraj
Copy link
Contributor Author

The PR build is failing because of build failures in net-snmp. Please look into these failures. The package may need to be updated to 5.9.3+dfsg-2 (similar to the upgrade done for Bullseye).

5.9.3+dfsg-2 wasn't available in the sonicblob. Hence download failed. Used the existing package 5.9+dfsg-4 for net-snmp. Now following errors are seen for bookworm build but not seen for bullseye build.

Any inputs on resolving this ?

2024-02-27T16:28:43.1201411Z [ FAIL LOG START ] [ target/debs/bookworm/libnl-nf-3-dev_3.5.0-1_amd64.deb-install ]
2024-02-27T16:28:43.1226890Z Build start time: Tue Feb 27 16:28:32 UTC 2024
2024-02-27T16:28:43.1227257Z dpkg: error: dpkg frontend lock was locked by another process with pid 197664
2024-02-27T16:28:43.1227605Z Note: removing the lock file is always wrong, can damage the locked area
2024-02-27T16:28:43.1228270Z and the entire system. See <https://wiki.debian.org/Teams/Dpkg/FAQ#db-lock>.
2024-02-27T16:28:43.1229042Z [  FAIL LOG END  ] [ target/debs/bookworm/libnl-nf-3-dev_3.5.0-1_amd64.deb-install ]

2024-02-27T16:28:43.3395027Z make: *** [slave.mk:848: target/debs/bookworm/libnl-nf-3-dev_3.5.0-1_amd64.deb-install] Error 1
2024-02-27T16:28:43.3395389Z make: *** Waiting for unfinished jobs....

2024-02-27T16:30:11.9310522Z transports/snmpTLSBaseDomain.c:59:22: error: static declaration of 'ERR_get_error_all' follows non-static declaration
2024-02-27T16:30:11.9310751Z    59 | static unsigned long ERR_get_error_all(const char **file, int *line,
2024-02-27T16:30:11.9310991Z       |                      ^~~~~~~~~~~~~~~~~

2024-02-27T16:30:11.9313469Z make[4]: *** [Makefile:101: transports/snmpTLSBaseDomain.lo] Error 1
2024-02-27T16:30:11.9313629Z make[4]: *** Waiting for unfinished jobs....


@saiarcot895
Copy link
Contributor

5.9.3+dfsg-2 should now be available from sonicblob, can you test it?

Makefile.work Outdated
@@ -311,8 +311,8 @@ endif
ifeq ($(DOCKER_BUILDER_WORKDIR),)
override DOCKER_BUILDER_WORKDIR := "/sonic"
endif

DOCKER_RUN := docker run --rm=true --privileged --init \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are a lot of changes here, are these meant to be local/debug changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are automatically added after rebase with master branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to recheck the branch rebases, there are changes in this PR that shouldn't be here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maipbui to keep eyes on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes were done to do a local build. Got pushed by mistake. Have removed those changes in the latest commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build went through fine. Failure seen in kvmtest->Prepare testbed step. Unable to access the detailed logs. Do you have any inputs on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the issue of not being able to see the test logs was fixed. At any rate, here is the failure:

12:54:21 recover.adaptive_recover                 L0169 WARNING| Restoring {'failed': True, 'check_item': 'processes', 'host': 'vlab-01', 'processes_status': {'pmon': {'status': True, 'exited_critical_process': [], 'running_critical_process': []}, 'snmp': {'status': False, 'exited_critical_process': ['snmp-subagent'], 'running_critical_process': ['snmpd']}, 'lldp': {'status': True, 'exited_critical_process': [], 'running_critical_process': ['lldp-syncd', 'lldpd', 'lldpmgrd']}, 'database': {'status': True, 'exited_critical_process': [], 'running_critical_process': ['redis']}, 'bgp': {'status': True, 'exited_critical_process': [], 'running_critical_process': ['bgpcfgd', 'bgpd', 'fpmsyncd', 'staticd', 'zebra']}, 'swss': {'status': True, 'exited_critical_process': [], 'running_critical_process': ['buffermgrd', 'coppmgrd', 'fabricmgrd', 'fdbsyncd', 'intfmgrd', 'nbrmgrd', 'neighsyncd', 'orchagent', 'portmgrd', 'portsyncd', 'tunnelmgrd', 'vlanmgrd', 'vrfmgrd', 'vxlanmgrd']}, 'syncd': {'status': True, 'exited_critical_process': [], 'running_critical_process': ['syncd']}, 'teamd': {'status': True, 'exited_critical_process': [], 'running_critical_process': ['teammgrd', 'teamsyncd', 'tlm_teamd']}}, 'services_status': {'pmon': True, 'snmp': False, 'lldp': True, 'database': True, 'bgp': True, 'swss': True, 'syncd': True, 'teamd': True}} with proposed action: config_reload, final action: config_reload

snmp-subagent process is supposed to be running, but doesn't appear to be running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following patches are available in 5.9.3 codebase. They need not be included.
0001-SNMP-Stop-spamming-logs-with-statfs-permission-denie.patch
0002-at.c-properly-check-return-status-from-realloc.-Than.patch
0003-CHANGES-BUG-2743-snmpd-crashes-when-receiving-a-GetN.patch
0006-From-Jiri-Cervenka-snmpd-Fixed-agentx-crashing-and-or-freezing-on-timeout.patch
0009-Makefile.in-agent-Makefile.in-Fix-parallel-compilati.patch
0010-Makefile.in-Make-sure-that-sedscript-is-built-before.patch
0011-agent-Makefile.in-Build-the-MIB-module-code-once.patch

Following patch looks specific to 5.7.3 version. Will not add this.
0007-Linux-VRF-5.7.3-Support.patch

Following patch is described as "Enable macro DEB_BUILD_ARCH_OS in order to build ipv6 feature". But it doesn't apply.
0008-Enable-macro-DEB_BUILD_ARCH_OS-in-order-to-build-ipv.patch
Following patch is required but doesn't apply
cross-compile-changes.patch

Following patches are required.
0012-agent-Makefile.in-Unbreak-the-enable-minimalist-buil.patch
0013-enable-parallel-build-for-net-snmp.patch

Next step is to add the following patches to 5.9.3.
0008-Enable-macro-DEB_BUILD_ARCH_OS-in-order-to-build-ipv.patch
0012-agent-Makefile.in-Unbreak-the-enable-minimalist-buil.patch
0013-enable-parallel-build-for-net-snmp.patch
cross-compile-changes.patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snmp-subagent still isn't up. How to access the process logs?

"docker exec snmp supervisorctl status"
dependent-startup                RUNNING   pid 7, uptime 0:00:10
rsyslogd                         RUNNING   pid 20, uptime 0:00:08
snmp-subagent                    BACKOFF   Exited too quickly (process log may have details)
snmpd                            RUNNING   pid 24, uptime 0:00:08
start                            EXITED    Mar 15 11:03 AM
supervisor-proc-exit-listener    RUNNING   pid 8, uptime 0:00:10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft ,
Need your help to understand what exactly the failure is because of which snmp-subagent is exiting.

@mohan-selvaraj
Copy link
Contributor Author

5.9.3+dfsg-2 should now be available from sonicblob, can you test it?

sure. will try and update

@qiluo-msft qiluo-msft requested a review from maipbui March 7, 2024 21:04
@saiarcot895 saiarcot895 mentioned this pull request Mar 7, 2024
11 tasks
@mohan-selvaraj
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@mohan-selvaraj
Copy link
Contributor Author

@qiluo-msft ,

Have added the patches in snmpd based on code comparison.
Also rebased the lldp_bkwm branch to the latest master.
I still see that the snmp_subagent process doesn't come up.

"docker exec snmp supervisorctl status"
dependent-startup                RUNNING   pid 7, uptime 0:00:10
rsyslogd                         RUNNING   pid 20, uptime 0:00:08
snmp-subagent                    BACKOFF   Exited too quickly (process log may have details)
snmpd                            RUNNING   pid 24, uptime 0:00:08
start                            EXITED    Mar 15 11:03 AM
supervisor-proc-exit-listener    RUNNING   pid 8, uptime 0:00:10

Need your help to understand what exactly the failure is because of which snmp-subagent is exiting.

@saiarcot895
Copy link
Contributor

@mohan-selvaraj, you should be able to see this issue when loading the image on a setup (either KVM or physical). After the image is loaded, check /var/log/syslog to see why snmp-subagent is exiting/failing to start.

@mohan-selvaraj
Copy link
Contributor Author

Additional pull request is created for changes in src/snmp-agent
sonic-net/sonic-snmpagent#313

With the changes in snmpagent, following is the process status

root@sonic:/home/admin# docker exec snmp supervisorctl status
dependent-startup                EXITED    Mar 29 09:00 AM
rsyslogd                         RUNNING   pid 20, uptime 0:24:55
snmp-subagent                    RUNNING   pid 25, uptime 0:24:53
snmpd                            RUNNING   pid 24, uptime 0:24:55
start                            EXITED    Mar 29 08:59 AM
supervisor-proc-exit-listener    RUNNING   pid 8, uptime 0:24:56
root@sonic:/home/admin#
root@sonic:/home/admin# docker exec -it snmp bash
root@sonic:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 08:59 pts/0    00:00:00 /usr/bin/python3 /usr/local/bin/supervisord
root           8       1  0 08:59 pts/0    00:00:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name snmp
root          20       1  0 08:59 pts/0    00:00:00 /usr/sbin/rsyslogd -n -iNONE
Debian-+      24       1  0 08:59 pts/0    00:00:00 /usr/sbin/snmpd -f -Ls0-2d -u Debian-snmp -g Debian-snmp -I -smux mteTrigger mteTriggerConf ifTable ifXTable inetCidrRouteTable ipCidrRouteTable ip disk_hw -p /run/snmpd.pid
root          25       1  5 08:59 pts/0    00:01:22 python3 -m sonic_ax_impl
root         172       0  2 09:23 pts/1    00:00:00 bash
root         178     172  0 09:23 pts/1    00:00:00 ps -ef

syslogs

...
Mar 29 09:00:00.981657 sonic INFO snmp#snmp-subagent [ax_interface] INFO: Registering subID: [.1.3.6.1.2.1.2.2.1.21]
Mar 29 09:00:00.981910 sonic INFO snmp#snmp-subagent [ax_interface] INFO: Registering subID: [.1.3.6.1.2.1.2.2.1.22]
Mar 29 09:00:00.982043 sonic INFO snmp#snmp-subagent [ax_interface] INFO: OID registration complete. Waiting to receive PDUs...
Mar 29 09:00:00.989625 sonic INFO snmp#supervisord 2024-03-29 09:00:00,988 INFO exited: dependent-startup (exit status 0; expected)

@mohan-selvaraj
Copy link
Contributor Author

/azpw run

1 similar comment
@mohan-selvaraj
Copy link
Contributor Author

/azpw run

@mohan-selvaraj
Copy link
Contributor Author

/AzurePipelines run

Copy link

Commenter does not have sufficient privileges for PR 18150 in repo sonic-net/sonic-buildimage

@mohan-selvaraj
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895
Copy link
Contributor

@mohan-selvaraj Until the snmpagent submodule gets merged, it doesn't make sense to rerun the pipeline here.

@adyeung
Copy link
Collaborator

adyeung commented Apr 4, 2024

Blocked on #313, expecting build test to pass after the merge of snmp subagent change

@saiarcot895
Copy link
Contributor

@mohan-selvaraj could you trigger a rebuild when you get a chance?

@@ -0,0 +1,4 @@
0008-Enable-macro-DEB_BUILD_ARCH_OS-in-order-to-build-ipv.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 0001-SNMP-Stop-spamming-logs-with-statfs-permission-denie.patch, 0009-Makefile.in-agent-Makefile.in-Fix-parallel-compilati.patch, 0010-Makefile.in-Make-sure-that-sedscript-is-built-before.patch, 0011-agent-Makefile.in-Build-the-MIB-module-code-once.patch, and 0012-agent-Makefile.in-Unbreak-the-enable-minimalist-buil.patch were dropped. But it doesn't seem like the changes in those patches are in 5.9.3. Is there a reason these patches were dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm if the code in the dropped patches aren't present in the 5.9.3 codebase.
I had an earlier comment on which patches were required and which can be ignored.

Copy link
Contributor

@saiarcot895 saiarcot895 Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0001-SNMP-Stop-spamming-logs-with-statfs-permission-denie.patch is needed, 0008-Enable-macro-DEB_BUILD_ARCH_OS-in-order-to-build-ipv.patch can be optionally dropped (since that logic is now coming from include /usr/share/dpkg/architecture.mk in debian/rules), 0012-agent-Makefile.in-Unbreak-the-enable-minimalist-buil.patch is needed. The others are correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895
As I understand, the only reason why we build snmp packages (libsnmp-base, snmptrapd, snmp, snmpd, libsnmp40, libsnmp-dev, libsnmp-perl, tkmib) from source is 0001-SNMP-Stop-spamming-logs-with-statfs-permission-denie.patch. Maybe it's possible to find another solution without modifyng source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Based on what's in sonic-net/sonic-snmpagent#22, there might not be any changes needed at all. For now, I prefer to let this get merged in, and then look at removing the snmp build (either in time for 202405 release or after the branch cutoff).

Copy link
Contributor

@k-v1 k-v1 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895

We should remove 0001-SNMP-Stop-spamming-logs-with-statfs-permission-denie.patch
fsys_mntctl.c is for AIX: https://github.com/net-snmp/net-snmp/blob/59acd6e6fcfccfd3456ec8a65816ca76036e142f/agent/mibgroup/hardware/fsys.h#L2
Original bug in fsys_mntent.c has been already fixed in upstream: bvanassche/net-snmp@5f1986c

@yxieca yxieca merged commit df499b6 into sonic-net:master Apr 22, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants