-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAI debug generate dump support #1846
base: master
Are you sure you want to change the base?
SAI debug generate dump support #1846
Conversation
aviramd
commented
Nov 6, 2024
PR Title | State | Context |
---|---|---|
[sonic-utilities] add support for sai debug generate dump | ||
[sonic-swss] orchagent for generate SAI debug dump file | ||
[sonic-swss-common] add support for debug generate dump file | ||
[sonic-sairedis] add support for sai_dbg_generate_dump API |
/azp run |
No pipelines are associated with this pull request. |
e148706
to
3668df2
Compare
Fixed Problems/New Changes: Should be Tested: Signed-off-by: aviramd <[email protected]>
|
||
1. A user command, such as `show techsupport` triggers the `generate_sai_dump`, and creates a new table with the dump file name to create in the APPL DB. | ||
2. A new orchestration agent, `DbgGenDumpOrch`, is triggered to handle the request. | ||
3. `DbgGenDumpOrch` writes the file name to the ASIC DB and sets a new operation `REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP` for syncd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a bottle neck in the regular orchagent to syncd flow using ASIC_DB. As the commands are synchronous, during techsupport it may block other events.
If the saisdkdump takes time this may even cause timeout between orchagent and syncd. If the motivation is to just move from saisdkdump I don't see any advantage in this flow and it might event cause performance issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed and consensus is to provide an ability to make the call non blocking. This should be configurable. @aviramd Please add this to the document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I have added a general remark in the HLD requirements for this. Once I implement it, I will provide more detailed instructions on how to configure it.
``` | ||
|
||
#### show techsupport | ||
Introduced a new generic API, `generate_sai_dbg_dump_file`, in `generate_dump.sh` (invoked by the `show techsupport` command) to create a debug dump file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nvidia platform, sai_dbg_generate_dump call generates a bunch of files.
Eg:
docker exec syncd mkdir -p /tmp/saisdkdump/
docker exec syncd saisdkdump -f /tmp/saisdkdump/sai_sdk_dump_12_10_2024_09_21_PM
ls -l /tmp/saisdkdump/
total 60948
-rw-r--r-- 1 root root 11854441 Dec 10 21:25 sai_sdk_dump_12_10_2024_09_21_PM
-rw-r--r-- 1 root root 1826 Dec 10 21:25 sai_sdk_dump_12_10_2024_09_21_PM.json
drwxr-xr-x 2 root root 4096 Dec 10 21:25 sai_sdk_dump_12_10_2024_09_21_PM_extras
-rw-r--r-- 1 root root 1137 Dec 10 21:25 sdk_dump_ext_.....summary.txt
-rw-r--r-- 1 root root 1201437 Dec 10 21:25 sdk_dump_ext_.....driver.txt
...............
I'm not aware of other vendors but i believe SONiC should have flexibility to provide a path for vendor SAI to dump whatever it deems relevant. Thus, i recommend the following, let me know what you think
- Update generate_sai_dbg_dump_file to take in two parameters
Eg:generate_sai_dbg_dump_file -p "$path" -f "$filename"
- Make sure syncd runs sai_dbg_generate_dump with arg $path/$filename
- After the execution is complete, techsupport process will copy the contents under $path back into the host and save them into the tar file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially assumed that the SAI API generates a single file since it receives a specific file name rather than just a path. However, I have now realized that the API may generate multiple files.
To accommodate this, I will modify the implementation so that theshow tech-support
process copies all files from the specified directory instead of targeting a single file.
In my opinion, Since the generate_sai_dbg_dump_file
API is generic for all vendors and is invoked as part of the main show tech-support
flow, the path for the SAI to place the files in the syncd container remains generic as well, using /var/log/dbg_gen_dump.log
. The generate_sai_dump
function in the gen_sai_dbg_dump.sh
script ensures that the /var/log/
directory exists and creates it if necessary. This directory is temporary, as the files generated are moved from the syncd
filesystem to the host machine once the process completes. Therefore, I don’t see any value in defining separate paths for different vendors.
Ultimately, since the purpose of calling generate_sai_dbg_dump_file
is to generate the dump files and copy them to the show tech-support folder, the temporary paths where the files are stored in between do not really matter.
To avoid potential conflicts, I will create a unique directory under /var/log/
to store the files temporarily, ensuring that such a directory does not already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a standard directory for everyone should suffice and the thread that collects the debug dump should be made singleton. i.e. only one instance runs at a time.
I think after compressing and moving the contents, we should clear the folder.
1.add option for blocking and non blocking flow 2. add timeout configuration 3.result set to APP STATE DB
Community review recording on 12/10/2024 https://zoom.us/rec/share/CbcJ93t3qSx0EcqtXTm2jIjmFHUu2BThFONnRNLEUtTjGJTbppz13nRuc_Sypxjx.Yq5Be4v1s70elUmT |