-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bandwidth problem with sja1105 port #47
Comments
The default configuration for Qbv does not permit 1000 Mbps bandwidth for best-effort traffic anyway. |
My description may be confusing. I did not enable Qbv when testing bandwidth, as a comparison test of Qbv demo. Now the bandwidth is normal again, I will give feedback the next time this problem happens. |
I think I know the reason for the sja1105 bandwidth change.When this happens, there are some errors on the ingress port of the sja1150 that cause frame drop, so the tester cannot get the correct bandwidth value. The dropped frames account for about one thousandth of the total. The error is mainly CRCERR, also some SOFERR and MIIERR.When testing only one sja1105, this happens occasionally, and when connecting it to another sja1105 or a normal switch to test, the problem will always occur. MAC-Level Diagnostic Flags High-Level Diagnostic Counters |
Thank you for the investigation done so far.
May I also know which port number these errors are seen on? I need this info for some further commands. |
This is the result of a recent test.These errors are seen on port 0 (eth5). MAC-Level Diagnostic Counters MAC-Level Diagnostic Flags High-Level Diagnostic Counters |
[root@OpenIL:init.d]# ./S46sja1105-link-speed-fixup start |
Can you please further run the following commands after you observe the RGMII errors? You should run them once before the frame errors occur, and once afterwards (the reason is that the counters get cleared upon read):
Also, what would it take for me to try to reproduce this? How many cables do you have connected to the switch? Is the temperature higher than usual? It happens even when the link partner is another LS1021A-TSN switch port, right? Are both boards connected to the same ground reference? |
Before sending the test stream: MAC-Level Diagnostic Counters [root@OpenIL:]# /etc/init.d/S46sja1105-link-speed-fixup start |
After sending the test stream: MAC-Level Diagnostic Counters High-Level Diagnostic Counters [root@OpenIL:etc]# /etc/init.d/S46sja1105-link-speed-fixup start |
Are you performing any SPI transactions to any of the switches when this is happening? Or are the systems simply idling and passing traffic? |
No,I do nothing with it when I sent the test stream. |
The PHY counters I asked you to read are indicating that bad start-of-stream delimiters have been found in received frames since the last readout. So whatever the SJA1105 port is seeing, the PHY is seeing too. May I know what the tester is testing for? Frame preemption, by any chance? Does the tester have the ability to decode raw Ethernet code words? Do you have a capture of the frames that trigger the bad SSD error? What is the structure of the test stream? |
The ETH5 connected to the tester (LS1021ATSN in Figure 1 and LS1021ATSN-1 in Figure 2) sometimes sees packet loss,the ETH5 (LS1021ATSN-2 in Figure 2)connected to LS1021ATSN always sees. The counter list values of the ports which packets are lost in different diagrams are very close,so I only show one. |
Have you made any progress with this? I am not able to confirm the behavior with traffic based on your PCAP, or provide other debugging hints. Is your switch configuration XML different from the standard? |
Hello, i am doing demo with one tsn board(LS1021ATSN), according to the pdf(Open Industrial Linux User Guide Release v0.2),but when i did the schedule configuration (6.8.6),there's something wrong,just like this:
modify failed! ,i am new to this,do you know what is wrong?Thank you. |
As part of the demonstration of the TSN functionality of the LS1021ATSN-PA card embedding the OS Open-ILv1.7 - Xenomai / cobalt v3.1-devel, I followed the procedure specified by the document " Open Industrial Linux User Guide, Rev 1.6 08/2019 "for this hardware (chapter 7.2) after having set up the network topology presented in chapter 7.2.2 page 114 (3 LS1021ATSN-PA cards linked together). I encountered a problem from the first step, when setting up a standard configuration (expected results covered by chapter 7.2.8.5.4). The bandwidth obtained from board 2 to 3 and from board 1 to 3 is "chaotic", very far from 950 mbits / s. I used the command line "sja1105-tool status port" on each board and I observed an incrementing of the N_MIIERR counters (port 1 for board 1, port 1 and 2 for board 2 and port 2 for board 3) while iPerf3 running (source 172.15.0.1 destination 127.15.0.3). Bandwidth drops rapidly (over 90%) and oscillates around 10 mbits / s. Same issue for the 2 to 3 test board. This test was performed in TCP. In UDP, the problem is less obvious, but with a loss of 50%. In addition, for the "Rate-Limiting - Prioritizing configuration" scenario with the implementation of priorities (flow 1 to 3 priority over flow 2 to 3), I saw (test in UDP) an inversion of bandwidths ( 1 to 3 around 100mbits / s with 2 to 3 around 500mbits / s for 5s then inversion ...). Regarding the tests on the implementation of the "Synchronized Qbv" demonstration, despite the bandwidth problem, I could observe the expected result for the "3-HOP" scenario (stable latency 30 ms). Do you have some idea of investigation to submit to us or an idea on the origin of the problem to help us set up a representative demonstration? In addition to my issue description, you can find hereafter a test which read, during iPerf3 running, the control register of the PHY3 and PHY2 provided by the BCM56514R. The collision test bit appears to have been mounted about twenty times for the board where I monitored it, for PORT 2 ETH3 connected to BOARD 1 and for PORT1 ETH2 connected to board 3. The test steps are described hereafter. Could you indicates me more information about the read registers used (#47 (comment)) (I didn't find registers specification for the BCM56514R)
[root@OpenIL:~]# iperf3 -1 -f m -i 0.5 -s -p 5202
[root@OpenIL:~]# iperf3 -t 86400 -p 5202 -c 172.15.0.3
while true; do etsec_mdio read 3 0x0; done | tee /tmp/outP1_ETH2;
PORT 2 || MAC-Level Diagnostic Counters || PORT 3 || MAC-Level Diagnostic Counters ||
[root@OpenIL:~]# grep 11e1 /tmp/outP1_ETH2 | wc -l [root@OpenIL:~]# grep 11e1 /tmp/outP2_ETH3 | wc -l Thank you for your feedback, |
Hi there, I'm sorry for the trouble and I'm also aware of the NXP support ticket you have opened.
So it is perhaps indicative of a hardware issue (misconfiguration or otherwise): the PHY has either asserted the RX_ER signal, or deasserted the RX_DV signal of the switch's MAC. We have not seen this manifest during development or testing. The unfortunate part is that the default LS1021A-TSN image is not equipped with software for proper debugging for this kind of issue. The sja1105-tool being a user space driver, it does not register net devices in the kernel, so it cannot register with the PHY library, to get a driver in control of the BCM5464R or cannot even perform any sort of MDIO access towards the PHY. This cannot be changed given that sja1105-tool is what it is (a user space driver). What the etsec_mdio script does is more of a hack: it copies what the kernel driver does ( I think that even unbinding eth0 and eth1 would be enough to get a more reliable read:
But if there is a PHY configuration issue, that would still be difficult to spot with raw MDIO accesses. The BCM5464R PHY, in the default OpenIL setup, is left with mostly the defaults configured via pin strapping, with the exception of link speeds which are forced to 1000 in the The lack of a PHY driver was one of the main reasons for moving sja1105 to a kernel driver, and if you are willing to spend some time, then it would be helpful if you could give the mainline kernel a try. There, the switch ports are registered as swp2, swp3, swp4, swp5, and the MAC statistics can be retrieved with ethtool -S swp2 (it's the same information, but has the advantage that the Hope this helps, |
Thank you very much for your reply. I will continue the investigation taking into account your advice :) |
Thank you very much for your advices. After your recommendations, I followed the following steps:
Unfortunately, I also observe the same issue, with disastrous bandwidth: "ethtool -S swp2" result: NIC statistics:
"ifconfig" result: swp2 Link encap: Ethernet HWaddr 00: 04: 9F: EF: 05: 05
Some phytool execution on PHY connected to TSN swicth port 2 and 1 phytool read swp2/3/0x12 phytool read swp3/4/0x12 phytool read swp3/4/0x12 A hardware issue seems to be the cause of the errors encountered with our three boards. |
Thanks for the work investigating this. |
Does the PHY report receive errors from other link partners too? What happens if you change the link speed with "ethtool -s swp2 advertise 0x8" (for 100 Mbps, or 0x20 to go back to 1 Gbps)? Could the cables be an issue? |
Hi,I found a problem when I tried the Qbv demo. I connected two hosts through a sja1105 and found that the bandwidth between them is unstable. The bandwidth is sometimes close to 1000M, but sometimes it is only about 500M. The switch is set to the default tsn configuration. When I tried to connect two hosts via two sja1105, the situation got worse and the bandwidth was only about 10M. Do you know the reason for this problem?
The text was updated successfully, but these errors were encountered: