Fatal error handler #191

marcoaccame · 2021-06-22T11:09:29Z

This PR introduces a more powerful fatal error handler which:

saves info about the fatal error on NZI RAM,
forces a restart of the MPU,
and then sends such info to yarprobotinterface

This PR addresses issue in here.
The relevant binaries are in this PR.

Description of the information sent to `yarprobotinterface`

In case of a fatal error, the board will restart and send messages such as:

[**INFO**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 953m 401u: 
(code 0x0000003b, par16 0x0000 par64 0x0000000000000000) -> SYS: the board is bootstrapping + . 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 7u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + RESTARTED after FATAL error 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 117u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + @ 20000 ms 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 235u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + handler OSAL, code 0xe5 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 348u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + type osal_stackovf 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 467u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + IRQHan SVCall Thread runDO 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 581u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + ipsr 11, tid 11

List. Board 10.0.1.1 has detected a fatal error (first message of type DEBUG: tag00 w/ string RESTARTED after FATAL error) at its execution time 20 sec (second message with string 20000 ms). The third and fourth message tell that the error was caused by the OSAL handler and is due to stack overflow (see string ype osal_stackovf). The handler was called by a thread the I_RQHandler SVCall_ which is the one which does thread switching and the error was caused by last scheduled thread called runDO (it is the one which ticks all teh services at 1 kHz.

For the case of hw_HardFault handler, we also send the content of the CFSR register which can help detecting the cause of fault.

1068,141704 <INFO>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 781m 884u: 
(code 0x0000003b, par16 0x0000 par64 0x0000000000000000)
 -> SYS: the board is bootstrapping + . 

1068,141765 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 489u:
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + RESTARTED after FATAL error 

1068,141838 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 600u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e) -> 
DEBUG: tag00 + @ 1286766 ms 

 1068,141890 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 722u: 
 (code 0x04000000, par16 0x0000 par64 0x030364060013a26e) ->
 DEBUG: tag00 + handler hw_HardFault, code 0x64 

1068,151700 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 832u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + type see TBL 

1068,151839 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 952u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
-> DEBUG: tag00 + IRQHan HardFault Thread tmrma 

1068,151902 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 784m 68u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + ipsr 3, tid 3 
 
1068,151902 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 784m 68u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + CFSR 0x00000000

List. Board 10.0.1.20 has detected a fatal error of type hw_HardFault. In such a case we also transmit teh content of the CFSR register.

Tests

The resulting binaries have been extensively tested on a test bench with an ems board and also on the iCubGenova09 robot, which actually produced the messages emitted by board 10.0.1.20.

* ems v3.42, mc4plus v3.36, mc2plus v3.26 - built on 2021 jun 23 - main changes from previous release are: - enabled the fatal error handler to force a restart and send useful information to yarprobotinterface (see robotology/icub-firmware#191) * updated versions

marcoaccame added 2 commits June 22, 2021 12:18

added EOtheFatalError to ems, mc4plus, mc2plus

189df23

added specific info for fatalerror_handler_hw_HardFault

1ad3073

marcoaccame marked this pull request as draft June 22, 2021 11:09

enriched diagnostic messages for the hardfault case

cc2fb4a

marcoaccame mentioned this pull request Jun 23, 2021

ems v3.42, mc4plus v3.36, mc2plus v3.26 (fatal error handler) robotology/icub-firmware-build#32

Merged

marcoaccame marked this pull request as ready for review June 23, 2021 08:27

marcoaccame merged commit 17f1bdb into robotology:devel Jun 23, 2021

marcoaccame mentioned this pull request Sep 6, 2021

Analysis of FW of mc4plus to solve board's disappearance from the ETH network #174

Open

marcoaccame deleted the feat/fatal-error-handler branch January 11, 2022 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fatal error handler #191

Fatal error handler #191

marcoaccame commented Jun 22, 2021 •

edited

Loading

Fatal error handler #191

Fatal error handler #191

Conversation

marcoaccame commented Jun 22, 2021 • edited Loading

Description of the information sent to yarprobotinterface

Tests

marcoaccame commented Jun 22, 2021 •

edited

Loading

Description of the information sent to `yarprobotinterface`