-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAI debug generate dump support #1846
base: master
Are you sure you want to change the base?
Changes from all commits
c358fd8
3668df2
6c149e5
18c7d5f
0b96753
0e9194b
6683629
6966e0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,266 @@ | ||
# Generate SAI Debug Dump | ||
|
||
## Table of Contents | ||
|
||
- [Generate SAI Debug Dump](#Generate-SAI-Debug-Dump) | ||
- [Table of Contents](#table-of-contents) | ||
- [Revision](#revision) | ||
- [Scope](#scope) | ||
- [Terminology](#terminology) | ||
- [Overview](#overview) | ||
- [Requirements](#requirements) | ||
- [Architecture Design](#architecture-design) | ||
- [Implementation](#implementation) | ||
- [generate_sai_dump bash script](#generate_sai_dump-bash-script) | ||
- [show techsupport](#show-techsupport) | ||
- [DbgGenDump orchestration](#DbgGenDump-orchestration) | ||
- [SAI global API sai_dbg_generate_dump](#SAI-global-API-sai-dbg_generate_dump) | ||
- [syncd extended operation](#syncd-extended-operation) | ||
- [YANG model changes](#yang-model-changes) | ||
- [CLI](#cli) | ||
- [Warmboot and Fastboot Design Impact](#Warmboot-and-Fastboot-Design-Impac) | ||
- [Testing Requirements/Design](#Testing-Requirements/Design) | ||
- [Unit Test cases](#Unit-Test-cases) | ||
- [System Test cases](#System-Test-cases) | ||
|
||
### Revision | ||
|
||
| Rev | Date | Author | Change Description | | ||
| :-: | :------: | :-----------------------: | ------------------ | | ||
| 0.1 | 10/15/24 | Aviram Dali (**Marvell**) | Initial Draft | | ||
### Scope | ||
|
||
The scope of this document is to design the handling of generating a SAI debug dump file by user command , specifically for `show techsupport` command. | ||
|
||
### Terminology | ||
|
||
| Term | Definition | | ||
| ----- | --------------------------------------- | | ||
| ASIC | Application Specific Integrated Circuit | | ||
| SYNCD | ASIC Synchronization Service | | ||
| SAI | Switch Abstraction Interface | | ||
| API | Application Programmable Interface | | ||
| SWSS | Switch State Service | | ||
### Overview | ||
SAI dump file usually includes SDK info and configuration , SAI stats, capture of SAI lower layer states like registers vales etc... | ||
|
||
Currently, the SAI dump file is generated only during SAI failures by executing a dedicated executable named "saisdkdump" (which linkage with the SAI lib and during initialization it creates a new switch in redundant mode) | ||
|
||
This new feature allows users to generate a SAI debug dump file using `show tech-support` command, not necessarily during failure. | ||
|
||
### Requirements | ||
|
||
+ Add infrastructure to generate a SAI debug dump file upon user request | ||
+ generate a SAI debug dump file from 'show techsupport' command. | ||
+ Generate a SAI debug dump file within the context of Syncd. | ||
+ By default the operation will be blocking but also support configuration of making the action non blocking | ||
+ In the case of a blocking operation, the timeout for file readiness will be configurable. | ||
+ Maintain the existing mechanism for generating the SAI debug dump file on failure. | ||
|
||
### Assumptions | ||
|
||
The SAI global API `sai_dbg_generate_dump` operates as a blocking function. | ||
|
||
### Architecture Design | ||
|
||
Adding a new infrastructure without changes in existing Sonic Architecture | ||
|
||
1. A user command, such as `show techsupport` triggers the `generate_sai_dump`, and creates a new table with the dump file name to create in the APPL DB. | ||
2. A new orchestration agent, `DbgGenDumpOrch`, is triggered to handle the request. | ||
3. `DbgGenDumpOrch` writes the file name to the ASIC DB and sets a new operation `REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP` for syncd. | ||
4. Syncd calls the global SAI API `dbgGenerateDump` to generate the debug dump file, which is saved in syncd's file system. | ||
5. Syncd sends a reply back to `DbgGenDumpOrch`. | ||
6. `DbgGenDumpOrch` analyzes the response. | ||
7. `DbgGenDumpOrch` updates the result in the APPL STATE DB. | ||
8. The user command retrieves the result. | ||
9. The debug dump file is pulled on success. | ||
|
||
The below diagram explains the generate debug dump file flow | ||
|
||
![Architecture Design](images/generate_debug_dump_file.png) | ||
|
||
|
||
### Implementation | ||
|
||
#### generate_sai_dump bash script | ||
Introduced a new script `/usr/local/bin/gen_sai_dbg_dump.sh` | ||
|
||
``` | ||
############################################################################### | ||
# generate_sai_dump | ||
# | ||
# Description: | ||
# This function | ||
# it ensures that the `syncd` container is running before initiating the dump. | ||
# triggers the generation of a SAI debug dump file through Redis APPL DB. | ||
# it waits for the file by Polling (with timeout) the APPL STATE DB for the result. | ||
# it removes the table from the DB when done. | ||
# | ||
# Arguments: | ||
# $1 - Filename for the SAI debug dump file. | ||
# $2 - Optional timeout for file readiness (default: 60 seconds). | ||
# | ||
# Returns: | ||
# 0 - On success | ||
# 1 - On failure | ||
############################################################################### | ||
generate_sai_dump() { | ||
... | ||
} | ||
``` | ||
|
||
The script also can be invoked from the CLI to generate the dump file directly under the given name (without calling `show techsupport` command): | ||
``` | ||
/usr/local/bin/gen_sai_dbg_dump.sh -f /tmp/my_dump_file.log | ||
``` | ||
|
||
#### show techsupport | ||
Introduced a new generic API, `generate_sai_dbg_dump_file`, in `generate_dump.sh` (invoked by the `show techsupport` command) to create a debug dump file: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For nvidia platform, sai_dbg_generate_dump call generates a bunch of files. Eg:
I'm not aware of other vendors but i believe SONiC should have flexibility to provide a path for vendor SAI to dump whatever it deems relevant. Thus, i recommend the following, let me know what you think
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I initially assumed that the SAI API generates a single file since it receives a specific file name rather than just a path. However, I have now realized that the API may generate multiple files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having a standard directory for everyone should suffice and the thread that collects the debug dump should be made singleton. i.e. only one instance runs at a time. I think after compressing and moving the contents, we should clear the folder. |
||
|
||
``` | ||
|
||
# generate_sai_dbg_dump_file | ||
# | ||
# Description: | ||
# This function triggers the generation of a SAI debug dump file and saves the | ||
# dumped file in the show techsupport output directory. | ||
# | ||
# Globals: | ||
# None | ||
# | ||
# Arguments: | ||
# $1 - (required) The file name (without path) the SAI debug dump will be saved | ||
# under this name in the show techsupport output directory. | ||
# | ||
# Returns: | ||
# 0 - On success | ||
# 1 - On failure | ||
############################################################################### | ||
generate_sai_dbg_dump_file(){ | ||
... | ||
} | ||
``` | ||
|
||
usage: | ||
|
||
``` | ||
generate_sai_dbg_dump_file "sai_sdk_dump_$(date +"%m_%d_%Y_%I_%M_%p")" | ||
``` | ||
|
||
#### DbgGenDump orchestration | ||
- A new orchestration agent, `DbgGenDumpOrch`, has been introduced, which is triggered by updates in the APPL DB. | ||
|
||
- It updates syncd by writing to the ASIC DB and waits for a response. Once received, it writes the result back to the APPL STATE DB, allowing the calling application to retrieve the file. | ||
|
||
#### DB Enhancements | ||
|
||
Introduced a new Tables in APPL DB : | ||
|
||
``` | ||
key = DBG_GEN_DUMP_TABLE:DUMP ; Unique identifier for gen dump file. | ||
;field = value | ||
file_name = STRING ; full path file to save the dump file. | ||
``` | ||
|
||
Example: | ||
``` | ||
redis-cli -n 0 HGETALL "DBG_GEN_DUMP_TABLE:DUMP" | ||
1) "file" | ||
2) "/var/log/sai_dump_file.log" | ||
``` | ||
|
||
Introduced a new Tables in APPL STATE DB : | ||
|
||
wait for the dump generation result example: | ||
``` | ||
key = DBG_GEN_DUMP_STATUS_TABLE:DUMP ; Unique identifier for gen dump file result | ||
;field = value | ||
status = SAI_STATUS ; result status of file dump generation | ||
``` | ||
|
||
Example: | ||
``` | ||
redis-cli -n 0 HGETALL "DBG_GEN_DUMP_STATUS_TABLE:DUMP" | ||
1) "status" | ||
2) "0" | ||
``` | ||
|
||
Introduced a new Tables in ASIC DB: | ||
|
||
``` | ||
key = DBG_GEN_DUMP:DUMP ; Unique identifier for gen dump file result | ||
;field = value | ||
file_name = STRING ; full path file to save the dump file. | ||
``` | ||
|
||
Example: | ||
|
||
``` | ||
redis-cli -n 1 HGETALL "DBG_GEN_DUMP:DUMP" | ||
1) "DBG_GENERATE_DUMP" | ||
2) "/var/log/sai_dump_file.log" | ||
``` | ||
|
||
#### SONIC support global API sai_dbg_generate_dump | ||
`sai_dbg_generate_dump` is already supported in SAI. Similar to other global API that supported in Sonic, add support to the global API `sai_dbg_generate_dump` to the `SaiInterface` class and ensuring that all derived classes provide the corresponding implementation | ||
|
||
``` | ||
class SaiInterface{ | ||
... | ||
virtual sai_status_t dbgGenerateDump( | ||
_In_ const char *dump_file_name) = 0; | ||
... | ||
} | ||
``` | ||
|
||
#### syncd extended operation | ||
|
||
Similar to other global API that supported in Sonic, add new operation to the syncd to support SAI debug generate dump | ||
|
||
``` | ||
sai_status_t Syncd::processSingleEvent( | ||
_In_ const swss::KeyOpFieldsValuesTuple &kco) | ||
{ | ||
... | ||
|
||
if (op == REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP) | ||
return processDbgGenerateDump(kco); | ||
``` | ||
|
||
|
||
``` | ||
sai_status_t Syncd::processDbgGenerateDump( | ||
_In_ const swss::KeyOpFieldsValuesTuple &kco) | ||
{ | ||
... | ||
//call SAI dbgGenerateDump API | ||
sai_status_t status = m_vendorSai->dbgGenerateDump(file_path); | ||
... | ||
//update ASIC DB with the result | ||
m_selectableChannel->set(sai_serialize_status(status), {} , REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMPRESPONSE); | ||
|
||
return status; | ||
} | ||
``` | ||
|
||
#### SAI API | ||
There are currently no new SAI APIs required for this feature. | ||
|
||
#### YANG model changes | ||
No Changes. | ||
|
||
#### CLI | ||
No changes. | ||
|
||
### Warmboot and Fastboot Design Impact | ||
There is no impact on warmboot or fastboot | ||
|
||
### Testing Requirements/Design | ||
|
||
#### Unit Test cases | ||
execute dump file and make sure it exists | ||
/usr/local/bin/gen_sai_dbg_dump.sh -f /tmp/my_dump_file.log | ||
|
||
#### System Test cases | ||
Verify if the dump in `show techsupport` contains the SAI dump file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a bottle neck in the regular orchagent to syncd flow using ASIC_DB. As the commands are synchronous, during techsupport it may block other events.
If the saisdkdump takes time this may even cause timeout between orchagent and syncd. If the motivation is to just move from saisdkdump I don't see any advantage in this flow and it might event cause performance issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed and consensus is to provide an ability to make the call non blocking. This should be configurable. @aviramd Please add this to the document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I have added a general remark in the HLD requirements for this. Once I implement it, I will provide more detailed instructions on how to configure it.