Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ip/test_ip_packet.py Increase tx_ok tolerance from 200 to 400 on T2 #16005

Merged

Conversation

arista-nwolfe
Copy link
Contributor

@arista-nwolfe arista-nwolfe commented Dec 11, 2024

#7954 described an issue we saw on T2 platforms where test_ip_packet.py::test_forward_ip_packet_with_0xffff_chksum_drop would fail sporadically if sufficient control/background traffic egressed the out_ifaces.

#8414 was created to fix this by increasing the tolerance from 100 to 200 packets. However, it seems like this wasn't sufficient tolerance as we're seeing failures on T2 devices:

Failed: Forwarded 276 packets in tx, not in expected range
Failed: Forwarded 275 packets in tx, not in expected range
Failed: Forwarded 264 packets in tx, not in expected range
Failed: Forwarded 295 packets in tx, not in expected range
Failed: Forwarded 316 packets in tx, not in expected range
Failed: Forwarded 301 packets in tx, not in expected range
Failed: Forwarded 264 packets in tx, not in expected range

But if we look at the RX_DROP counters we see all 1000 packets were dropped on RX:

root@cmp235-3:~# show interfaces counters
Last cached time was 2024-12-10T22:16:35.704269
      IFACE    STATE    RX_OK      RX_BPS    RX_UTIL    RX_ERR    RX_DRP
-----------  -------  -------  ----------  ---------  --------  --------
Ethernet144        U   15,399  141.74 B/s      0.00%         0     1,000

So the packets captured on TX was background traffic.

Summary:
Fixes #7954

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

@wenyiz2021
Copy link
Contributor

wenyiz2021 commented Dec 13, 2024

I'm ok with this change, but have a dumb question:

is it enough if we make sure Rx_drop is 1000?
e.g. if Rx_drop is 1000, we can skip the range asserts because no packets are egress-ed and it's the most ideal case. if it is not exact match, we can check the range with tolerance?

@arista-nwolfe
Copy link
Contributor Author

I'm ok with this change, but have a dumb question:

is it enough if we make sure Rx_drop is 1000? e.g. if Rx_drop is 1000, we can skip the range asserts because no packets are egress-ed and it's the most ideal case. if it is not exact match, we can check the range with tolerance?

Yeah IMO the TX_OK check seems like a weird one in this test, we should just check that all the packets were dropped on RX and that should be sufficient to indicate a passing test. But I'm not sure if certain platforms were failing because not all packets were being dropped or counted hence the tolerance was introduced? I figured rewriting the test to just check RX_DROP might cause issues on other platforms.

@wenyiz2021
Copy link
Contributor

I'm ok with this change, but have a dumb question:
is it enough if we make sure Rx_drop is 1000? e.g. if Rx_drop is 1000, we can skip the range asserts because no packets are egress-ed and it's the most ideal case. if it is not exact match, we can check the range with tolerance?

Yeah IMO the TX_OK check seems like a weird one in this test, we should just check that all the packets were dropped on RX and that should be sufficient to indicate a passing test. But I'm not sure if certain platforms were failing because not all packets were being dropped or counted hence the tolerance was introduced? I figured rewriting the test to just check RX_DROP might cause issues on other platforms.

I mean something like:

if Rx_drop is 1000:
return True
pytest_aserts1
pytest_aserts2
...

how about this?

@wenyiz2021
Copy link
Contributor

because we were increasing the Tx_ok tolerance the 2nd time, and it seems is not a necessary check

@rlhui rlhui merged commit 2ded2f2 into sonic-net:master Dec 14, 2024
18 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 17, 2024
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #16097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[ip/test_forward_ip_packet_with_0xffff_chksum_drop] Failed in tx cnt check
6 participants