-
Notifications
You must be signed in to change notification settings - Fork 0
Chassis DB Consistency Test Plan
Rev | Date | Author | Change Description |
---|---|---|---|
0.1 | 08/02/2023 | Junhong Mao | Initial Draft |
This test plan is to check the functionalities of chassis database consistency in the case of one or more Line-cards being pulled out, etc.
The test environment is a Nokia 7250 IXR-10e Interconnect Routers.
When all line cards such as ixre-egl-board40 and ixre-egl-board41 were plugged in and worked normally, log in the chassis CPM board, such as ixre-cpm-chassis15.
Verify the database by below shell script db-con.sh in the supervisor
$ cat db-con.sh
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_NEIGH|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_INTERFACE|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_MEMBER_TABLE|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_TABLE|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_TABLE|ixre-egl-board40"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_NEIGH|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_INTERFACE|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_MEMBER_TABLE|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_TABLE|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_TABLE|ixre-egl-board41"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_SET"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_TABLE"
Pass
If the content are valid in the below format:
{
"SYSTEM_NEIGH|ixre-egl-board40|asic0|Ethernet-IB0|3.3.3.1": {
"expireat": 1690815616.4330785,
"ttl": -0.001,
"type": "hash",
"value": {
"encap_index": "1074790404",
"neigh": "40:7c:7d:bb:26:15"
}
},
......
Fail
If the contents are empty
Reboot one or more Line-cards by using the command on Line-cards
sudo reboot
Verify the database by db-con.sh in the supervisor to see if the related contents were cleaned up as part of this reboot process.
Pass
If the contents were cleaned up during booting and became valid later
The valid content is like the below format. The contents are empty if they were cleaned up.
{
"SYSTEM_NEIGH|ixre-egl-board40|asic0|Ethernet-IB0|3.3.3.1": {
"expireat": 1690815616.4330785,
"ttl": -0.001,
"type": "hash",
"value": {
"encap_index": "1074790404",
"neigh": "40:7c:7d:bb:26:15"
}
},
......
Fail
Otherwise
Pulled out one or more Line-cards.
(1) Verify the database by db-con.sh in the supervisor before 30 minutes and after 30 minutes
(2) Verify the related syslog by the below command
tail -f /var/log/syslog
The valid content is like the below format.
{
"SYSTEM_NEIGH|ixre-egl-board40|asic0|Ethernet-IB0|3.3.3.1": {
"expireat": 1690815616.4330785,
"ttl": -0.001,
"type": "hash",
"value": {
"encap_index": "1074790404",
"neigh": "40:7c:7d:bb:26:15"
}
},
......
The contents are empty if they were cleaned up.
The sample of syslog is below:
Aug 1 20:41:49.069227 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module LINE-CARD0|ixre-egl-board40 is down for long time. Initiating chassis app db clean up
Aug 1 20:41:49.083447 ixre-cpm-chassis15 NOTICE pmon#chassisd: Cleaned up chassis app db entries for LINE-CARD0(ixre-egl-board40)/asic0
Aug 1 20:41:49.095707 ixre-cpm-chassis15 NOTICE pmon#chassisd: Cleaned up chassis app db entries for LINE-CARD0(ixre-egl-board40)/asic1
Pass
If the related contents were valid before 30 minutes and were cleaned up when 30 minutes were due.
Fail
Otherwise
To simulate the midplane connectivity loss, by the below commands:
admin@ixre-egl-board40:~$ sudo ifconfig eth1-midplane down
Shut down one or more Line-cards' midplane interface, the Line-card will reboot itself after 60 seconds due to it's unable to reach CPM. The message like below will be logged in syslog.
ixre-egl-board40 login: 23-08-02 20:51:21.826 sr_device_mgr: Rebooting - Unable to reach CPM. Reboot self.
During the Line-card booting process, the database will be clear-up.
After booting process, the database will be filled.
Then check the syslog( by tail) and database (by db-con.sh).
Pass
(1) if there is no database clean-up within 60 seconds after the midplane connectivity’s loss
(2) and there is database clean-up during the Line-card boot process after the midplane connectivity’sloss
(3) and the database become normal after the Line-card boot up.
otherwise
Reboot one or more Line-card by using the below command in the console session other than the ssh session.
~ sudo reboot
In the GNU Grub Menu, select ONIE as below.
ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ ³
³ ³
³*SONiC-OS-msft-2205-ndk.0-dirty-20230726.220238 ³
³ SONiC-OS-msft-2205-ndk.0-dirty-20230723.234148 ³
³ ONIE ³
By this mean, the default image, such as SONiC-OS-msft-2205-ndk.0-dirty-20230726.220238 will not be boot.
(1) Verify the database by db-con.sh in the supervisor during 30 minutes and after 30 minutes
(2) Verify the related syslog by below command
tail -f /var/log/syslog
Pass
If the database was not cleaned until 30 minutes later, pass.
Fail
Otherwise