Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[action] [PR:15253] feat: add more sanity checks for T2 #16277

Merged
merged 1 commit into from
Jan 2, 2025

Conversation

mssonicbld
Copy link
Collaborator

Description of PR

Add BFD up count check and MAC entries count check to sanity check for T2 topo.

Summary:
Fixes # (issue) Microsoft ADO 29825439 & 29825466

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

During our T2 Nightly run, we found that there will be a chance that the port channel connection between 2 ASICs is up but MAC address was not learned and the BFD session between them is down. Therefore, we need to have sanity check to make sure BFD are all up and all MAC addresses are learned, otherwise issue like this will affect the test result and can impact production env.

How did you do it?

  1. Added check_bfd_up_count() function to sanity check for T2 topo only. This check will take ~4 seconds to run on a T2 device with 3 LC (frontend nodes).
  2. Added check_mac_entry_count() function to sanity check for T2 supervisor only. This check will take ~17 seconds to finish on a T2 device where its supervisor has 10 ASICs.

How did you verify/test it?

I ran the updated code on T2 with multiple test modules and can confirm it's checking the BFD up count and MAC entries count properly. Elastictest link: https://elastictest.org/scheduler/testplan/676bbfe8ab42af53500adb8d?leftSideViewMode=detail

Besides, I can also confirm that these 2 checks will be skipped on non-T2 devices.

Any platform specific information?

Supported testbed topology if it's a new test case?

T2

Documentation

Description of PR
Add BFD up count check and MAC entries count check to sanity check for T2 topo.

Summary:
Fixes # (issue) Microsoft ADO 29825439 & 29825466

Approach
What is the motivation for this PR?
During our T2 Nightly run, we found that there will be a chance that the port channel connection between 2 ASICs is up but MAC address was not learned and the BFD session between them is down. Therefore, we need to have sanity check to make sure BFD are all up and all MAC addresses are learned, otherwise issue like this will affect the test result and can impact production env.

How did you do it?
Added check_bfd_up_count() function to sanity check for T2 topo only. This check will take ~4 seconds to run on a T2 device with 3 LC (frontend nodes).
Added check_mac_entry_count() function to sanity check for T2 supervisor only. This check will take ~17 seconds to finish on a T2 device where its supervisor has 10 ASICs.
How did you verify/test it?
I ran the updated code on T2 with multiple test modules and can confirm it's checking the BFD up count and MAC entries count properly. Elastictest link: https://elastictest.org/scheduler/testplan/676bbfe8ab42af53500adb8d?leftSideViewMode=detail

Besides, I can also confirm that these 2 checks will be skipped on non-T2 devices.

Any platform specific information?
Supported testbed topology if it's a new test case?
T2

co-authorized by: [email protected]
@mssonicbld
Copy link
Collaborator Author

/azp run

@mssonicbld
Copy link
Collaborator Author

Original PR: #15253

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 4e9553a into sonic-net:202411 Jan 2, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants