Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Update SDK sniffer default target folder #12

Closed
wants to merge 458 commits into from

Conversation

dprital
Copy link
Owner

@dprital dprital commented Mar 18, 2024

What I did

Change the target path for SDK Sniffer from "/var/log/mellanox/sniffer/" To: "/var/log/sdk_dbg"

How I did it

Change the default for SDK_SNIFFER_TARGET_PATH

How to verify it

Run SDK sniffer and make sure the sniffer output file kept in the new location

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

jfeng-arista and others added 30 commits February 1, 2023 11:29
…ue/port" can work. (sonic-net#2499)

* Add asic id for linecards so "show fabric counters queue/port" can work.
* Add test coverage


---------

Signed-off-by: Jie Feng <[email protected]>
* show logging CLI support for logs stored in tmpfs

Signed-off-by: Mihir Patel <[email protected]>

* Fixed testcase failures

* Reverted unwanted change in a file

* Added testcase for syslog.1 in log.tmpfs directory

* mend

---------

Signed-off-by: Mihir Patel <[email protected]>
It could be that LOG_LEVEL_DB includes some invalid data and/or a KEY_SET that is not cleaned up due to an issue, for example we observed _gearsyncd_KEY_SET set included in the LOG_LEVEL_DB and preserved in warm reboot. However, this key is not of type hash which leads to an exception and migration failure. The migration logic should be more robust allowing users to upgrade even though some daemon has left overs in the LOG_LEVEL_DB or invalid data is written.

- What I did
To fix migration issue that leads to device configuration being lost.

- How I did it
Wrap the logic in try/except/finally.

- How to verify it
202205 -> 202211/master upgrade.

Signed-off-by: Stepan Blyschak <[email protected]>
* Added logic in techsupport script to collect SAI failure dump
…net#2629)

Signed-off-by: vaibhav-dahiya [email protected]
This PR adds support for show mux hwmode muxdirection as well as
show mux grpc muxdirection to show the state of gRPC connected to the SoCs for 'active-active' acble type


vdahiya@sonic:~$ show mux grpc muxdirection 
Port       Direction    Presence    PeerDirection    ConnectivityState
---------  -----------  ----------  ---------------  -------------------
Ethernet0  active       False       active           READY
vdahiya@sonic:~$ 
vdahiya@sonic:~$ show mux grpc muxdirection --json
{
    "HWMODE": {
        "Ethernet0": {
            "Direction": "active",
            "Presence": "False",
            "PeerDirection": "active",
            "ConnectivityState": "READY"
        }
    }
}

What I did
Added support for the commands.

How I did it
How to verify it
UT and running the changes on Testbed
… for ZR (sonic-net#2630)

* Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR

Signed-off-by: Mihir Patel <[email protected]>

* Added test case for info CLI

* Updated command reference

* Resolved merged conflicts

* Made convert_sfp_info_to_output_string generic for CMIS and non CMIS and added test case to address PR comment

* Resolved test_multi_asic_interface_status_all failure

* Addressed PR comments

---------

Signed-off-by: Mihir Patel <[email protected]>
…et#2495)

* [config/show] Add command to control pending FIB suppression

What I did
I added a command config suppress-pending-fib that will allow user to enable/disable this feature.
Once it is enabled, BGP will wait for route to be programmed to HW before announcing the route to the peers.

I also added a corresponding show command that prints the status of this feature.
What I did
Refer to sonic-net/sonic-buildimage#11171, protect loopback0 from deletion

How I did it
Add patch checker to fail the validation when remove loopback0

How to verify it
Unit test
…c-net#2660)

What I did
Currently, add/del a vlan doesn't change related dhcpv6_relay config, which is incorrect.

How I did it
1. Add dhcp_relay table init entry while adding vlan
2. Delete dhcp_relay related config while deleting vlan
3. Add unitest

How to verify it
1. By unitest
2. install whl and run cli

Signed-off-by: Yaqiang Zhu <[email protected]>
…net#2599", add fixes for empty /dump forder and symbolic links (sonic-net#2645)

- What I did
0ee19e5 Revert Revert the show-techsupport optimization PR's sonic-net#2599
c8940ad Add a fix for the empty /dump folder inside the final tar archive generated by the show techsupport CLI command.
8a8668c Add a fix to not follow the symbolic links to avoid duplicate files inside the final tar archive generated by the show techsupport CLI command.

- How I did it
Modify the scripts/generate_dump script.

- How to verify it
1. Manual verification
do the show techsupport CLI command and save output original.tar.gz (with original generate_dump script)
do the show techsupport CLI command and save output fixes.tar.gz (with the generate_dump script modified by this PR)
unpack both archives original.tar.gz and fixes.tar.gz
compare both directories with ncdu & diff --brief --recursive original fixes Linux utilities
2. Run the community tests
sonic-mgmt/tests/show_techsupport

Signed-off-by: vadymhlushko-mlnx <[email protected]>
* no print if use json format
* add print for chassis
What I did
Add docs for dhcp_realy show/clear cli

How I did it
Add docs for dhcp_realy show/clear cli

Signed-off-by: Yaqiang Zhu <[email protected]>
- What I did
Add support of secure warm-boot to SONiC.
Basically, warm-boot is supporting to load a new kernel without doing full/cold boot.
That is by loading a new kernel and exec with kexec Linux command. As a result of that, even when the Secure Boot feature is enabled, still a user or a malicious user can load an unsigned kernel, so to avoid that we added the support of the secure warm boot.
More Description about this feature can be found in the Secure Boot HLD: sonic-net/SONiC#1028

- How I did it
In general, Linux support it, so I enabled this support by doing the follow steps:

I added some special flags in Linux Kernel when user build the sonic-buildimage with secure boot feature enabled.
I added a flag "-s" to the kexec command
Note: more details in the HLD above.

- How to verify it
* Good flow:
manually just install with sonic-installed a new secure image (a SONiC image that was build with Secure Boot flag enabled)
after the secure image is installed, do:
warm-reboot
Check now that the new kernel is really loaded and switched.
* Bad flow:
Do the same steps 1-2 as a good flow but with an insecure image (SONiC image that was built without setting Secure Boot enabled)
After the insecure image is installed, and triggered warm-boot you should get an error that the new unsigned kernel from the unsecured image was not loaded.
Automation test - TBD
… vlan (sonic-net#2678)

What I did
Remove add field of vlanid to DHCP_RELAY table while add vlan which would cause conflict with yang model.

How I did it
Remove add field of vlanid to DHCP_RELAY table while add vlan

How to verify it
By unit tests

Signed-off-by: Yaqiang Zhu <[email protected]>
Signed-off-by: maipbui <[email protected]>
#### What I did
`pickle` can lead to lead to code execution vulnerabilities. Recommend to serializing the relevant data as JSON.
#### How I did it
Replace `pickle` by `json`
#### How to verify it
Pass UT
Manual test
…o support for muxcable (sonic-net#2414)

This PR adds the support for adding some utility commands for muxacble
This includes commands for health, operationtime, queueinfo, resetcause

vdahiya@sonic:~$ show mux health Ethernet4 
PORT          ATTR               HEALTH
---------     ---------------   --------
Ethernet4     health_check       Ok
vdahiya@sonic:~$ show mux health Ethernet4 --json
{
    "health_check": "Ok"
}

vdahiya@sonic:~$ show mux operation Ethernet4 --json
{
    "operation_time": "22:22"
}
vdahiya@sonic:~$ show mux operation Ethernet4
PORT       ATTR              OPERATION_TIME
---------  --------------  ----------------
Ethernet4  operation_time                 22:22
vdahiya@sonic:~$ 

vdahiya@sonic:~$ show mux resetcause Ethernet4
PORT       ATTR           RESETCAUSE
---------  -----------  ------------
Ethernet4  reset_cause             0

vdahiya@sonic:~$ show mux resetcause Ethernet4 --json
{
    "reset_cause": "0"
}

vdahiya@sonic:~$ show mux queueinfo Ethernet4 --json
{
    "Remote": "{'VSC': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 0, 'node_size': 0}, 'UART1': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 209870, 'node_size': 1682183}, 'UART2': {'r_ptr': 13262, 'w_ptr': 3, 'total_count': 0, 'free_count': 0, 'buff_addr': 12, 'node_size': 0}}",
    "Local": "{'VSC': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 0, 'node_size': 0}, 'UART1': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 209870, 'node_size': 1682183}, 'UART2': {'r_ptr': 13262, 'w_ptr': 3, 'total_count': 0, 'free_count': 0, 'buff_addr': 12, 'node_size': 0}}"
}
…oot (sonic-net#2694)

- What I did
Add more logs for config reload/config minigraph/warm-reboot/fast/reboot to identify in the log (notice level) what was the command executed which could cause a service affect.

- How I did it
Add more logs for config reload/config minigraph/warm-reboot/fast/reboot.

- How to verify it
Manual test
…net#2692)

* add rdma gcu unit test

* fix comment

* clean unused code

* clean format

* extend to mock patchapplier, in place of changeapplier

* replace tabs with spaces
…et#2688)

Why I did
In device that doesn't have dhcp_relay service, restart dhcp_relay after add/del vlan would encounter failed

How I did it
Add support to check whether device is support dhcp_relay service.

How to verify it
1. Unit test
2. Build and install in device

Signed-off-by: Yaqiang Zhu <[email protected]>
What I did
These 3 packages maybe purged by default. Do not block pipeline.
Download deb/whl packages only to accelerate download process.
How I did it
How to verify it
… DB (sonic-net#2691)

Fixes: 201911 to 202205 warm upgrade failure in fpmsyncd reconciliation due to missing weight attr in routes. (sonic-net/sonic-buildimage#12625)

How I did it
Check for missing attribute weight in APPLDB route entries. If found missing this attribute is added with empty value.

How to verify it
Verified on physical device. 201911 to 202205 upgrade worked fine.
…utdown (sonic-net#2714)

Goal: Preserve logs during TOR upgrades and shutdown

Need:

Below PRs moved logs from disk to tmpfs for specific hwskus.
Due to these changes, shutdown path logs are now lost.
The logs in shutdown path are crucial for debug purposes.

sonic-net/sonic-buildimage#13805
sonic-net/sonic-buildimage#13587
sonic-net/sonic-buildimage#13587

How I did it
Check if logs are on tmpfs. If yes, backup logs from /var/log

How to verify it
Verified on a physical device - logs on tmfs are backed up for past 30 minutes.
…onic-net#2531)

* [route_check] implement a check for FRR routes not marked offloaded
* Implemented a route_check functioality that will check "show ip route json" output from FRR and will ensure that all routes are marked as offloaded. If some routes are not offloaded for 15 sec, this is considered as an issue and a mitigation logic is invoked.
What I did
Warm-reboot fails on kvm due to non-zero exit upon command
bootctl status 2>/dev/null | grep -c "Secure Boot: enabled"

How I did it
Added || true to return 0 when previous command fails.
Added CHECK_SECURE_UPGRADE_ENABLED to check output of previous command
Added debug logs

How to verify it
Run warm-reboot on kvm and physical device when increased verbosity. Expects debug log to indicate secure/non secure boot. Successful warm reboot
…#2712)

What I did
probe mux direction not always return success.

Sample output of: while [ 1 ]; do date; show mux hwmode muxdirection; show mux status; sleep 1; done

Mon 27 Feb 2023 03:12:25 PM UTC
Port         Direction    Presence
-----------  -----------  ----------
Ethernet16   unknown      True

PORT         STATUS    HEALTH    HWSTATUS      LAST_SWITCHOVER_TIME
-----------  --------  --------  ------------  ---------------------------
Ethernet16   standby   healthy   inconsistent  2023-Feb-25 07:55:18.269177
If we increase the timeout to 0.5 secs to get the values back from ycabled, this will remove the inconsistency issue, and display the consistent values, because while telemetry is going on, the time to get actual mux value takes significantly longer than 0.1 seconds.

PORT         STATUS    HEALTH    HWSTATUS      LAST_SWITCHOVER_TIME
-----------  --------  --------  ------------  ---------------------------
Ethernet16   standby   healthy   consistent  2023-Feb-25 07:55:18.269177
How I did it
How to verify it
Manually run changes on setup
worst-case CLI return time could be 16 seconds for 32 ports. on avg each port is 200 mSec if telemetry is going, but on average show command will return in < 1 sec for all 32 ports.

Signed-off-by: vaibhav-dahiya <[email protected]>
* Add status for ACL_TABLE and ACL_RULE in STATE_DB
deepak-singhal0408 and others added 28 commits December 19, 2023 10:06
* Enhanced route_check.py for multi_asic platforms
* skip eth1 routes for packet-chassis, pytest enhancements
* Collect module EEPROM data in dump
What I did
Need to support golden config in db migrator.

How I did it
If there's golden config json, read from golden config instead of minigraph.
And db migrator will use golden config data to generate new configuration.

How to verify it
Run unit test.
…et#3075)

* Execute Route check script only when feature bgp is enabled

If bgp is not enabled, get_frr_routes() gets empty list and route check fails and throws a traceback. Adding check to to skip route checks bgp feature is disabled. On the Chassis supervisor, bgp may be disabled.

Signed-off-by: Anand Mehra [email protected]
Depends on PR sonic-net/sonic-buildimage#17458

What I did
Add CLIs to enable/disable containercfgd to optimize warm/fast boot path

How I did it
Add CLIs to enable/disable containercfgd

How to verify it
unit test
manual test
What I did
db_migrator failed to initialize SonicDBConfig, and I fix this issue.

How I did it
If SonicDBConfig is already initialized, do not invoke initialize() again.

How to verify it
Run unit test, and verified on DUT.
…no external neighbors are configured on chassis LC (sonic-net#3099)

Support show ip bgp summary to display without error when no external neighbors are configured on chassis LC
…atforms (sonic-net#3115)

Disabling key validation feature in grub file as its not yet supported for Cisco platforms

What I did
Check if the platform we are installing the image on is a Cisco platform
Return success if it is so we are on Cisco platform. This way, we do not perform signature verification as this feature is not yet supported on our platforms.
How I did it
Modified sonic-installer grub.py code
… not selected as best (sonic-net#3130)

### What I did
Fixes sonic-net/sonic-buildimage#17877
#### How I did it
Added additional check to skip FRR-Offloaded check if the routeSrc BGP was not selected as best.
#### How to verify it
Ran the script on multi-asic KVM device, and could confirm the route_check is passing.
…KUs if the buffer configuration is empty (sonic-net#3114)

### What I did

Do not touch the buffer model on generic SKUs if the buffer configuration is empty.

#### How I did it

Set the buffer model to traditional on generic SKUs in Mellanox db migrator only if the buffer configuration is not default and not empty.

#### How to verify it

Manually and mock test.

### Details ####
Buffer configuration contains two parts:
1. the buffer model in `DEVICE_METADATA|localhost` which is from `init_cfg.json` and can be updated by Mellanox buffer migrator
2. the buffer pools, profiles, PGs, and queues which are renderred from the buffer templates in `config qos reload`

There was a logic to update the buffer model in Mellanox buffer migrator: if the buffer configuration is not default, the buffer model is set to traditional. However, if a device is installed from ONIE, the buffer configuration is also empty. As a result, the traditional buffer manager starts after the device is installed from ONIE, and it requires to restart the buffer manager to switch to the dynamic model. This can be done only by `config reload`.
It didn't matter since it was required to execute `config qos reload` to complete buffer configuration which required `config save` and `config reload` in any case due to issue sonic-net/sonic-buildimage#9088.
Now that the issue has been fixed and `config reload` isn't required anymore to complete `config qos reload`, we should avoid setting the buffer model to traditional in such case, otherwise `config reload` is still required to switch the buffer model.

Verified the following scenarios:
1. non-default configuration generic SKU upgrade from 202305: warm/cold boot: expected: traditional model
2. default configuration generic SKU upgrade from 201911/202305: warm/cold boot: expected: dynamic model
3. install from ONIE: expected: dynamic model
4. MSFT SKU upgrade from 201911 by cold boot/ from 202012 by warm boot: expected: traditional model
)

### What I did

Account for static routes in route_check.py when checking route offload status.

#### How I did it

skip routes that are "connected" or "kernel".

#### How to verify it

Run on 202311, make sure it reports Loopback IPv6 address as missing.
…client.eth0.pid does not exist" (sonic-net#3149)

* Fix load_mgmt_config not exit when dhclient.eth0.pid not exists

Signed-off-by: Mai Bui <[email protected]>

* add UT

Signed-off-by: Mai Bui <[email protected]>

---------

Signed-off-by: Mai Bui <[email protected]>
…ump utility. (sonic-net#3091)

- What I did
Add support of the nvidia-bluefield platform to generate-dump utility to collect platform-specific dumps on NVIDIA Bluefield DPU.

- How I did it
Extend platform-specific section of generate-dump utility

- How to verify it
Run "show techsupport" command to generate dump file. Verify that platform-dump directory exists in the created dump file.
…onic-net#3153)

* Fix the sfputil treats page number as decimal instead of hexadecimal (sonic-net#22)

Fix the sfputil treats page number as decimal instead of hexadecimal

Signed-off-by: Kebo Liu <[email protected]>

* remove unreachable code

Signed-off-by: Kebo Liu <[email protected]>

---------

Signed-off-by: Kebo Liu <[email protected]>
…SIC (sonic-net#3158)

This PR sonic-net#3099 fixes the case where on chassis Linecard there are no BGP neighbors. However, if the Linecard has neighbors on one ASIC but not on other, the command show bgp summary displayed no neighbors. This PR fixes this.

How I did it
Add check in bgp_util to create empty peer list only once
Add UT to cover this case
…ic-net#3148)

* [show] Update show run all to cover all asic config in masic

* per comment
### What I did
Fix  sonic-net#3164
Check Golden Config earlier before service is down.
#### How I did it
Move the check at the begining
#### How to verify it
Unit test
* Fix sfputil invalid namesapce error

* Add test case for loading port configuration

* Improve cov
…le (sonic-net#3177)

* Retrieve firmware version fields from TRANSCEIVER_FIRMWARE_INFO table

Signed-off-by: Mihir Patel <[email protected]>

* Fixed test failures

* Removed update_firmware_info_to_state_db function

* Revert "Removed update_firmware_info_to_state_db function"

This reverts commit 68f52a2.

---------

Signed-off-by: Mihir Patel <[email protected]>
@dprital dprital changed the title [Mellanox] Update SDK sniffer folder [Mellanox] Update SDK sniffer default target folder Mar 18, 2024
@dprital dprital closed this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.