Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI support for SmartSwitch PMON #3271

Open
wants to merge 177 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
177 commits
Select commit Hold shift + click to select a range
11cc04d
CLI support for SmartSwitch PMON
rameshraghupathy Apr 14, 2024
02df0ea
imad minor fixes
rameshraghupathy Apr 16, 2024
e0e4700
Did some cleanup for backward compatibility
rameshraghupathy Apr 27, 2024
0a8fc5a
removed the column wrapping
rameshraghupathy Apr 27, 2024
6d61faa
Made it backward compatible and removed textwrap and added ut to PR
rameshraghupathy Apr 28, 2024
8d95dae
1. There was a duplication of part of a function and that has been
rameshraghupathy May 1, 2024
5c1b666
reboot_cause and system_health are obtained directly from chassisStateDB
rameshraghupathy May 8, 2024
fe4a8cf
The expected and result are the same but the test is throwing an error,
rameshraghupathy May 8, 2024
f896438
Let us get the build going and then look into the test mockup
rameshraghupathy May 9, 2024
9d6c093
Implemented as per the pmon hld, also made some improvements in the
rameshraghupathy May 10, 2024
0904515
Fixed the key for CHASSIS_MODULE_INFO_TABLE entries
rameshraghupathy May 15, 2024
ccc380b
Fixed "show reboot-cause all" and "show reboot-cause history all"
rameshraghupathy May 16, 2024
a8fa81d
Addressing review comments
rameshraghupathy May 31, 2024
1cf96a0
Checking if the test issue still exists
rameshraghupathy May 31, 2024
64fd559
Resolving SA errors triggered due to reboot_cause_test
rameshraghupathy Jun 1, 2024
d202e1c
Resolved pre-commit issues
rameshraghupathy Jun 7, 2024
b8c92ae
Resolved pre-commit issues
rameshraghupathy Jun 7, 2024
9986f7b
Improving coverage
rameshraghupathy Jun 7, 2024
0dc52f6
Fixed SA related warnings
rameshraghupathy Jun 7, 2024
93df26d
Did some cleanup
rameshraghupathy Jun 7, 2024
7a2aaf4
Minor improvements and fixes
rameshraghupathy Jun 7, 2024
26f9b8a
Adding tests for system health
rameshraghupathy Jun 7, 2024
3a592f8
Adding more system health related tests
rameshraghupathy Jun 7, 2024
71472a8
Fixed a minor issue
rameshraghupathy Jun 7, 2024
fd8bd6b
Fixed long line SA issue
rameshraghupathy Jun 7, 2024
5b15bc4
Trying to please SA
rameshraghupathy Jun 7, 2024
b35c987
Trying to improve coverage
rameshraghupathy Jun 7, 2024
ee10649
import mock
rameshraghupathy Jun 8, 2024
27546a6
Fixed a typo
rameshraghupathy Jun 8, 2024
883e35c
mocking DB
rameshraghupathy Jun 8, 2024
713ffa2
Fixed syntax issues
rameshraghupathy Jun 8, 2024
62fc3d0
DB mock fix
rameshraghupathy Jun 8, 2024
ecb2ecc
removed unused import
rameshraghupathy Jun 8, 2024
e2eb660
creating ut for dpu state
rameshraghupathy Jun 8, 2024
ef87cb5
Improving coverage
rameshraghupathy Jun 8, 2024
53c2277
Fixed a typo
rameshraghupathy Jun 8, 2024
fb989e4
Adjusted the reboot-cause key as per the updated hld
rameshraghupathy Jun 13, 2024
8ea7960
Added fix to gracefully handle sytem health DB keys not present case
rameshraghupathy Jun 30, 2024
76de68a
Addressed minor review comments
rameshraghupathy Jul 9, 2024
a08e0cb
Addressed review comments. Commented out system-health support until
rameshraghupathy Jul 29, 2024
766b303
Resolved minor issues and SA failures
rameshraghupathy Jul 29, 2024
c474940
Added role to PORT table in config_db. Using role to differentiate
rameshraghupathy Aug 31, 2024
1910163
Resolving pre-commit check error related to line > 120
rameshraghupathy Aug 31, 2024
851dc78
Trying to avoid pre-commit issues
rameshraghupathy Aug 31, 2024
cb54b73
Testing SA and precommit checks
rameshraghupathy Aug 31, 2024
4dfb5f8
Making it backward compatible
rameshraghupathy Aug 31, 2024
6941baf
Resolving column size and whitespace issue
rameshraghupathy Sep 1, 2024
f3c8e36
Working on SA issue
rameshraghupathy Sep 1, 2024
6d7d539
Testing SA and UT
rameshraghupathy Sep 1, 2024
433bc50
Added 2 spaces before inline comment
rameshraghupathy Sep 1, 2024
3ddcc9c
Merge branch 'sonic-net:master' into master
rameshraghupathy Sep 1, 2024
95da5c0
Enabling "show system-health dpu" cli alone. The rest of the dpu health
rameshraghupathy Sep 4, 2024
627dd5e
Fixed SA issues
rameshraghupathy Sep 4, 2024
934e6ef
Adde new line at EOF
rameshraghupathy Sep 4, 2024
64d06ec
Enabling the UT for the CLI "show system-health dpu"
rameshraghupathy Sep 4, 2024
4870a86
Resolved SA issues
rameshraghupathy Sep 4, 2024
fed3f67
Resolved a SA issue
rameshraghupathy Sep 4, 2024
68b6416
Added smartswitch specific "reboot-cause" and "reboot-cause history" CLI
rameshraghupathy Sep 24, 2024
d229307
Removed the phase:2 related system-health cli extensions as a seperate
rameshraghupathy Sep 24, 2024
78e71c5
Using smartswitch qualifier for the clie extensions
rameshraghupathy Sep 28, 2024
d7fbe9d
Fixed SA issues
rameshraghupathy Sep 28, 2024
313a9d2
mocking device_info for test cases
rameshraghupathy Sep 28, 2024
0ea1227
import patch in tests
rameshraghupathy Sep 28, 2024
f5f88bb
Debugging test failure
rameshraghupathy Sep 28, 2024
62817ea
Fixing SA issues
rameshraghupathy Sep 28, 2024
9fb005d
fixing sa issues
rameshraghupathy Sep 28, 2024
7c8c5d7
Debugging sa issues
rameshraghupathy Sep 28, 2024
b5b068b
trying to resolve sa issues
rameshraghupathy Sep 28, 2024
25259cb
fixed indentation
rameshraghupathy Sep 28, 2024
808e7b4
debugging
rameshraghupathy Sep 28, 2024
7eb8304
debugging
rameshraghupathy Sep 28, 2024
44bed5c
debugging
rameshraghupathy Sep 28, 2024
d7fd0ce
debugging
rameshraghupathy Sep 28, 2024
b0e51f8
Debugging
rameshraghupathy Sep 29, 2024
ed742fc
debugging
rameshraghupathy Sep 29, 2024
11f48f3
debugging
rameshraghupathy Sep 29, 2024
402887d
Debugging
rameshraghupathy Sep 29, 2024
8db11f3
Debugging
rameshraghupathy Sep 29, 2024
2ab48b5
Debuggingg
rameshraghupathy Sep 29, 2024
e843fff
Debugging
rameshraghupathy Sep 29, 2024
9ba21d2
Debugging
rameshraghupathy Sep 29, 2024
738634d
Debugging
rameshraghupathy Sep 29, 2024
c491687
Debugging
rameshraghupathy Sep 29, 2024
ee3f927
Debugging
rameshraghupathy Sep 29, 2024
d47a431
Debugging
rameshraghupathy Sep 29, 2024
04c520e
Debugging
rameshraghupathy Sep 29, 2024
c5abc01
Debugging
rameshraghupathy Sep 29, 2024
6ab7742
Debugging
rameshraghupathy Sep 29, 2024
4299ac3
Debugging
rameshraghupathy Sep 29, 2024
d30ead7
Debugging
rameshraghupathy Sep 29, 2024
a07e8c0
Debugging
rameshraghupathy Sep 29, 2024
a2cece6
Debugging
rameshraghupathy Sep 29, 2024
e2b65af
Debugging
rameshraghupathy Sep 29, 2024
53909f0
Debugging
rameshraghupathy Sep 29, 2024
9849436
Debugging
rameshraghupathy Sep 29, 2024
02152e3
Debuggingg
rameshraghupathy Sep 29, 2024
a75a4d3
Debugging
rameshraghupathy Sep 29, 2024
f8a1f57
Debugging
rameshraghupathy Sep 29, 2024
29000c3
Debugging
rameshraghupathy Sep 29, 2024
e273a16
Debugging
rameshraghupathy Sep 29, 2024
d720cf6
Debugging
rameshraghupathy Sep 29, 2024
c6040b3
Debugging
rameshraghupathy Sep 29, 2024
864c96c
Debugging
rameshraghupathy Sep 29, 2024
8580f76
Debugging
rameshraghupathy Sep 29, 2024
f4942b7
Debugging
rameshraghupathy Sep 29, 2024
3e44844
Debugging
rameshraghupathy Sep 29, 2024
e7355b0
Debugging
rameshraghupathy Sep 30, 2024
b132f90
Debugging
rameshraghupathy Sep 30, 2024
781270a
Debugging
rameshraghupathy Sep 30, 2024
2e8813b
Debugging
rameshraghupathy Sep 30, 2024
6cba5ed
Removing the test to build an image
rameshraghupathy Sep 30, 2024
5db0bc2
Removed mock import
rameshraghupathy Sep 30, 2024
807529f
Improving coverage
rameshraghupathy Sep 30, 2024
885b168
pleasing SA
rameshraghupathy Sep 30, 2024
b6efa8c
Fixing tests for design changes as per review comments
rameshraghupathy Sep 30, 2024
4c26a25
Resolving test failure
rameshraghupathy Sep 30, 2024
ed3d24b
fixed indentation
rameshraghupathy Sep 30, 2024
68a9efe
cleaned up the test case
rameshraghupathy Oct 1, 2024
d09d58f
Addressed review comments in Command-Reference.md and trying to improve
rameshraghupathy Oct 1, 2024
c217c18
Improving coverage
rameshraghupathy Oct 1, 2024
df87438
Fixed a test issue
rameshraghupathy Oct 1, 2024
2dfc2b5
Addressed review comments
rameshraghupathy Oct 7, 2024
c261b0c
Addressed review comment. Reading DPUs list from config_db.json
rameshraghupathy Oct 8, 2024
ab200bc
Improving coverage
rameshraghupathy Oct 8, 2024
5e36792
Resolved SA error
rameshraghupathy Oct 8, 2024
4a43780
Trying to improve coverage. Also, reading from platform.json
rameshraghupathy Oct 8, 2024
8b2c9cb
adding json import in the test
rameshraghupathy Oct 8, 2024
155ba3f
Fixed a test failure
rameshraghupathy Oct 8, 2024
e8c8b42
Fixed SA error
rameshraghupathy Oct 8, 2024
9601177
Exercising the new function in test
rameshraghupathy Oct 9, 2024
9713bf7
Removed a blank line
rameshraghupathy Oct 9, 2024
fdf8569
fixing mock issue
rameshraghupathy Oct 9, 2024
4b30138
Trying a different approach
rameshraghupathy Oct 9, 2024
e725add
working on coverage
rameshraghupathy Oct 9, 2024
d2e7590
debugging
rameshraghupathy Oct 9, 2024
3e1fc12
debugging
rameshraghupathy Oct 9, 2024
51dce03
Debugging
rameshraghupathy Oct 9, 2024
a016ead
Increasing coverage
rameshraghupathy Oct 9, 2024
041fad6
improving coverage
rameshraghupathy Oct 9, 2024
5c85cf4
Adjusting the show cli implementation to align with the reboot-cause
rameshraghupathy Oct 23, 2024
1b3fabb
Fixing a minor issue
rameshraghupathy Oct 23, 2024
9a0225b
Removed ID column from the "show system-health dpu DPUx" cli as per t…
rameshraghupathy Oct 25, 2024
8f191d6
Addressed default dpu admin status for dark-mode and seamless migration
rameshraghupathy Oct 29, 2024
523a42c
Resolving SA issue
rameshraghupathy Oct 29, 2024
a90b878
Resolved a typo
rameshraghupathy Oct 30, 2024
594a9dc
Added checks to see if module_name is valid in the "config chassis
rameshraghupathy Nov 20, 2024
79666d1
Fixed white space issues
rameshraghupathy Nov 20, 2024
9bb29e3
Cleaned unwanted import
rameshraghupathy Nov 20, 2024
63d5f9f
Fixed build issues
rameshraghupathy Nov 20, 2024
1255ee6
missedout the fixes in a couple of files
rameshraghupathy Nov 20, 2024
d630304
With the recent code the app_db multi_asic.PORT_ROLE is Dpc for DPU
rameshraghupathy Nov 26, 2024
933c04e
As the port role issue is no longer seen in smartswitch, cleaning up the
rameshraghupathy Nov 26, 2024
5a4c7fd
Using the verbose define for TYPE_DPC in the CLI, if there is a specific
rameshraghupathy Nov 26, 2024
989fa80
Reverting intfutil_test.py
rameshraghupathy Nov 26, 2024
00df371
Using the common API to get_dpu_list
rameshraghupathy Dec 4, 2024
48c8419
Removed unused import json
rameshraghupathy Dec 4, 2024
be8d747
Addressed review comments
rameshraghupathy Dec 6, 2024
0764a34
Did some minor cleanp
rameshraghupathy Dec 6, 2024
54cfbab
Fix: SA error
rameshraghupathy Dec 6, 2024
00c0ee0
Addressed review comments
rameshraghupathy Dec 27, 2024
b43f72b
Addressed review comments
rameshraghupathy Dec 27, 2024
ec47fa2
Addressed review comments
rameshraghupathy Dec 27, 2024
8432ed8
Addressed review comments
rameshraghupathy Dec 27, 2024
df2517b
Addressed review comments
rameshraghupathy Dec 27, 2024
d30b4fb
Addressed review comments
rameshraghupathy Dec 27, 2024
3274de0
Addressed review comments
rameshraghupathy Dec 27, 2024
c53685f
Addressed review comments
rameshraghupathy Dec 27, 2024
2b77e74
Addressed review comments
rameshraghupathy Dec 27, 2024
d46bf3a
Addressed review comments
rameshraghupathy Dec 27, 2024
513f21d
Addressed review comments
rameshraghupathy Dec 28, 2024
8da07e1
Addressed review comments
rameshraghupathy Dec 28, 2024
6796e67
Addressed review comments
rameshraghupathy Dec 28, 2024
e89daf7
Addressed review comments
rameshraghupathy Dec 28, 2024
2ccb4c3
Addressed review comments
rameshraghupathy Dec 28, 2024
d0f02f7
Addressed review comments
rameshraghupathy Dec 28, 2024
8b86eee
Addressed review comments
rameshraghupathy Dec 28, 2024
4ed816f
Addressed review comments
rameshraghupathy Dec 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions config/chassis_modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,10 @@ def shutdown_chassis_module(db, chassis_module_name):

if not chassis_module_name.startswith("SUPERVISOR") and \
not chassis_module_name.startswith("LINE-CARD") and \
not chassis_module_name.startswith("FABRIC-CARD"):
ctx.fail("'module_name' has to begin with 'SUPERVISOR', 'LINE-CARD' or 'FABRIC-CARD'")
not chassis_module_name.startswith("FABRIC-CARD") and \
Copy link
Contributor

@gpunathilell gpunathilell Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to perform additional validation to check if the chassis_module_name is actually present (or is an actual valid module name) or not, if user executes config chassis modules startup DPU5 on a system which does not have DPU5, this will cause crash in chassisd for the SmartSwitchConfigManagerTask in chassisd preventing further startup or shutdown calls (even though output of the command would be Starting up chassis module DPU1 or Shutting down chassis module DPU1 the only operation which is performed is addition/removal from the CONFIG_DB )

Copy link
Author

@rameshraghupathy rameshraghupathy Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not chassis_module_name.startswith("DPU") and \
not chassis_module_name.startswith("SWITCH"):
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
ctx.fail("'module_name' has to begin with 'SUPERVISOR', 'LINE-CARD', 'FABRIC-CARD', 'DPU' or 'SWITCH'")

fvs = {'admin_status': 'down'}
config_db.set_entry('CHASSIS_MODULE', chassis_module_name, fvs)
Expand Down
145 changes: 118 additions & 27 deletions show/reboot_cause.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

import click
from tabulate import tabulate
import textwrap
from swsscommon.swsscommon import SonicV2Connector
import utilities_common.cli as clicommon

Expand All @@ -23,6 +24,100 @@ def read_reboot_cause_file():

return reboot_cause_dict

# Function to fetch reboot cause data from database
def fetch_reboot_cause_from_db(module_name):
table = []
r = []
wrapper = textwrap.TextWrapper(width=30)
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved

# Read the previous reboot cause
if module_name == "all" or module_name == "SWITCH":
reboot_cause_dict = read_reboot_cause_file()
reboot_cause = reboot_cause_dict.get("cause", "Unknown")
reboot_user = reboot_cause_dict.get("user", "N/A")
reboot_time = reboot_cause_dict.get("time", "N/A")

r.append("SWITCH")
r.append(reboot_cause if reboot_cause else "")
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
r.append(reboot_time if reboot_time else "")
r.append(reboot_user if reboot_user else "")
table.append(r)

if module_name == "SWITCH":
return table

REBOOT_CAUSE_TABLE_NAME = "REBOOT_CAUSE"
TABLE_NAME_SEPARATOR = '|'
db = SonicV2Connector(host='127.0.0.1')
db.connect(db.STATE_DB, False) # Make one attempt only
prefix = REBOOT_CAUSE_TABLE_NAME + TABLE_NAME_SEPARATOR
_hash = '{}{}'.format(prefix, '*')
table_keys = db.keys(db.STATE_DB, _hash)
if table_keys is not None:
table_keys.sort(reverse=True)

d = []
append = False
for tk in table_keys:
rameshraghupathy marked this conversation as resolved.
Show resolved Hide resolved
r = []
entry = db.get_all(db.STATE_DB, tk)
if 'device' in entry:
if module_name != entry['device'] and module_name != "all":
continue
if entry['device'] in d:
append = False
continue
else:
append = True
d.append(entry['device'])
if not module_name is None:
r.append(entry['device'] if 'device' in entry else "")
if 'cause' in entry:
wrp_cause = wrapper.fill(entry['cause'])
r.append(wrp_cause if 'cause' in entry else "")
r.append(entry['time'] if 'time' in entry else "")
r.append(entry['user'] if 'user' in entry else "")
if append == True:
table.append(r)

return table

# Function to fetch reboot cause history data from database
def fetch_reboot_cause_history_from_db(module_name):
REBOOT_CAUSE_TABLE_NAME = "REBOOT_CAUSE"
TABLE_NAME_SEPARATOR = '|'
db = SonicV2Connector(host='127.0.0.1')
db.connect(db.STATE_DB, False) # Make one attempt only
prefix = REBOOT_CAUSE_TABLE_NAME + TABLE_NAME_SEPARATOR
_hash = '{}{}'.format(prefix, '*')
table_keys = db.keys(db.STATE_DB, _hash)
wrapper = textwrap.TextWrapper(width=30)

if table_keys is not None:
table_keys.sort(reverse=True)

table = []
device_present = False
for tk in table_keys:
entry = db.get_all(db.STATE_DB, tk)
if 'device' in entry:
device_present = True
r = []
if not module_name is None and device_present:
r.append(entry['device'] if 'device' in entry else "SWITCH")
r.append(tk.replace(prefix, ""))
if 'cause' in entry:
wrp_cause = wrapper.fill(entry['cause'])
r.append(wrp_cause if 'cause' in entry else "")
r.append(entry['time'] if 'time' in entry else "")
r.append(entry['user'] if 'user' in entry else "")
if 'comment' in entry:
wrp_comment = wrapper.fill(entry['comment'])
r.append(wrp_comment if 'comment' in entry else "")
if module_name == 'all' or module_name == entry['device']:
table.append(r)

return table

#
# 'reboot-cause' group ("show reboot-cause")
Expand Down Expand Up @@ -61,34 +156,30 @@ def reboot_cause(ctx):

click.echo(reboot_cause_str)

# 'all' command within 'reboot-cause'
@reboot_cause.command()
def all():
"""Show cause of most recent reboot"""
reboot_cause_data = fetch_reboot_cause_from_db("all")
if not reboot_cause_data:
click.echo("Reboot-cause history is not yet available in StateDB")
else:
header = ['Device', 'Name', 'Cause', 'Time', 'User']
click.echo(tabulate(reboot_cause_data, header, numalign="left"))

# 'history' subcommand ("show reboot-cause history")
# 'history' command within 'reboot-cause'
@reboot_cause.command()
def history():
@click.argument('module_name', required=False)
def history(module_name):
"""Show history of reboot-cause"""
REBOOT_CAUSE_TABLE_NAME = "REBOOT_CAUSE"
TABLE_NAME_SEPARATOR = '|'
db = SonicV2Connector(host='127.0.0.1')
db.connect(db.STATE_DB, False) # Make one attempt only
prefix = REBOOT_CAUSE_TABLE_NAME + TABLE_NAME_SEPARATOR
_hash = '{}{}'.format(prefix, '*')
table_keys = db.keys(db.STATE_DB, _hash)
if table_keys is not None:
table_keys.sort(reverse=True)

table = []
for tk in table_keys:
entry = db.get_all(db.STATE_DB, tk)
r = []
r.append(tk.replace(prefix, ""))
r.append(entry['cause'] if 'cause' in entry else "")
r.append(entry['time'] if 'time' in entry else "")
r.append(entry['user'] if 'user' in entry else "")
r.append(entry['comment'] if 'comment' in entry else "")
table.append(r)

header = ['Name', 'Cause', 'Time', 'User', 'Comment']
click.echo(tabulate(table, header, numalign="left"))
else:
reboot_cause_history = fetch_reboot_cause_history_from_db(module_name)
if not reboot_cause_history:
click.echo("Reboot-cause history is not yet available in StateDB")
sys.exit(1)
else:
if not module_name is None :
header = ['Device', 'Name', 'Cause', 'Time', 'User', 'Comment']
click.echo(tabulate(reboot_cause_history, header, numalign="left"))
else:
header = ['Name', 'Cause', 'Time', 'User', 'Comment']
click.echo(tabulate(reboot_cause_history, header, numalign="left"))

146 changes: 135 additions & 11 deletions show/system_health.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@
import click
from tabulate import tabulate
import utilities_common.cli as clicommon
from swsscommon.swsscommon import SonicV2Connector

DPU_STATE = 'DPU_STATE'
CHASSIS_SERVER='redis_chassis.server'
CHASSIS_SERVER_PORT=6380

def get_system_health_status():
if os.environ.get("UTILITIES_UNIT_TESTING") == "1":
Expand Down Expand Up @@ -33,6 +37,90 @@ def get_system_health_status():

return manager, chassis, stat

def get_module_health(module_name):
try:
_, chassis, _ = get_system_health_status()
moduleindex = chassis.get_module_index(module_name)
if moduleindex:
module = chassis.get_module(moduleindex)
health_results = module.get_health_info()
if health_results:
return health_results.summary, health_results.monitorlist
except Exception as e:
click.echo("Error retrieving module health list:", e)
exit(1)

def show_module_health_all(mode):
_, chassis, _ = get_system_health_status()
for index, mod in enumerate(chassis._module_list):
module_name = mod.get_name()
if "DPU" in module_name:
health_summary, health_monitorlist = get_module_health(module_name)
if mode == "monitorlist":
display_monitor_list(health_monitorlist)
elif mode == "summary":
display_monitor_list(health_summary)
else:
display_monitor_list(health_summary)
display_monitor_list(health_monitorlist)

def show_module_state(module_name):
chassis_state_db = SonicV2Connector(host=CHASSIS_SERVER, port=CHASSIS_SERVER_PORT)
chassis_state_db.connect(chassis_state_db.CHASSIS_STATE_DB)

key_pattern = '*' if not module_name else '|' + module_name

keys = chassis_state_db.keys(chassis_state_db.CHASSIS_STATE_DB, DPU_STATE + key_pattern)
if not keys:
print('Key {} not found in {} table'.format(key_pattern, DPU_STATE))
return

table = []
for dbkey in natsorted(keys):
key_list = dbkey.split('|')
if len(key_list) != 2: # error data in DB, log it and ignore
print('Warn: Invalid Key {} in {} table'.format(dbkey, DPU_STATE))
continue

state_info = chassis_state_db.get_all(chassis_state_db.CHASSIS_STATE_DB, dbkey)

# Determine operational status
dpu_states = [value for key, value in state_info.items() if key.startswith('dpu')]
if all(state == "up" for state in dpu_states):
oper_status = "Online"
elif any(state == "up" for state in dpu_states):
oper_status = "Partial Online"
else:
oper_status = "Offline"

row = [module_name, state_info.get('id', ''), oper_status, "", "", "", ""]
for key, value in state_info.items():
if key.startswith('dpu'):
if key.endswith('_time'):
row[5] = value
elif key.endswith('_reason'):
row[6] = value
if not key.endswith('_state'):
row[0] = ""
row[1] = ""
row[2] = ""
table.append(row)
else:
state_detail = key
row[3] = state_detail
row[4] = value
table.append(row)

headers = ["Name", "ID", "Oper-Status", "State-Detail", "State-Value", "Time", "Reason"]
click.echo(tabulate(table, headers=headers))

def show_module_state_all():
_, chassis, _ = get_system_health_status()
for index, mod in enumerate(chassis._module_list):
module_name = mod.get_name()
if "DPU" in module_name:
show_module_state(module_name)

def display_system_health_summary(stat, led):
click.echo("System status summary\n\n System status LED " + led)
services_list = []
Expand Down Expand Up @@ -108,27 +196,63 @@ def system_health():
return

@system_health.command()
def summary():
@click.argument('module_name', required=False)
def summary(module_name):
"""Show system-health summary information"""
_, chassis, stat = get_system_health_status()
display_system_health_summary(stat, chassis.get_status_led())

if not module_name or module_name == "all":
_, chassis, stat = get_system_health_status()
display_system_health_summary(stat, chassis.get_status_led())
elif module_name.startswith("DPU"):
health_summary, _ = get_module_health(module_name)
display_monitor_list(health_summary)
elif module_name == "all":
show_module_health_all("summary")
else:
click.echo("Valid module-names are DPU0, DPU1, ...")

@system_health.command()
def detail():
@click.argument('module_name', required=False)
def detail(module_name):
"""Show system-health detail information"""
manager, chassis, stat = get_system_health_status()
display_system_health_summary(stat, chassis.get_status_led())
display_monitor_list(stat)
display_ignore_list(manager)

if not module_name or module_name == "all":
display_system_health_summary(stat, chassis.get_status_led())
display_monitor_list(stat)
display_ignore_list(manager)
elif module_name.startswith("DPU"):
health_summary, health_monitorlist = get_module_health(module_name)
display_monitor_list(health_summary)
display_monitor_list(health_monitorlist)
elif module_name.startswith("all"):
show_module_health_all("detail")
else:
click.echo("Valid module-names are DPU0, DPU1, ...")

@system_health.command()
def monitor_list():
@click.argument('module_name', required=False)
def monitor_list(module_name):
"""Show system-health monitored services and devices name list"""
_, _, stat = get_system_health_status()
display_monitor_list(stat)
if not module_name or module_name == "all":
display_monitor_list(stat)
elif module_name.startswith("DPU"):
_, health_monitorlist = get_module_health(module_name)
display_monitor_list(health_monitorlist)
elif module_name == "all":
show_module_health_all("monitorlist")
else:
click.echo("Valid module-names are DPU0, DPU1, ...")

@system_health.command()
@click.argument('module_name', required=False)
def dpu(module_name):
"""Show system-health dpu information"""
if module_name.startswith("DPU"):
show_module_state(module_name)
elif module_name == "all":
show_module_state_all()
else:
click.echo("Valid module-names are DPU0, DPU1, ...")

@system_health.group('sysready-status',invoke_without_command=True)
@click.pass_context
Expand Down
Loading