Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error handler #191

Merged
merged 3 commits into from
Jun 23, 2021
Merged

Fatal error handler #191

merged 3 commits into from
Jun 23, 2021

Conversation

marcoaccame
Copy link
Contributor

@marcoaccame marcoaccame commented Jun 22, 2021

This PR introduces a more powerful fatal error handler which:

  • saves info about the fatal error on NZI RAM,
  • forces a restart of the MPU,
  • and then sends such info to yarprobotinterface

This PR addresses issue in here.
The relevant binaries are in this PR.

Description of the information sent to yarprobotinterface

In case of a fatal error, the board will restart and send messages such as:

[**INFO**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 953m 401u: 
(code 0x0000003b, par16 0x0000 par64 0x0000000000000000) -> SYS: the board is bootstrapping + . 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 7u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + RESTARTED after FATAL error 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 117u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + @ 20000 ms 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 235u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + handler OSAL, code 0xe5 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 348u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + type osal_stackovf 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 467u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + IRQHan SVCall Thread runDO 

[**ERROR**]  from BOARD 10.0.1.1 (l-hv3-hand), src LOCAL, adr 0, time 1s 955m 581u: 
(code 0x04000000, par16 0x0000 par64 0x0b0be50300004e20) -> DEBUG: tag00 + ipsr 11, tid 11

List. Board 10.0.1.1 has detected a fatal error (first message of type DEBUG: tag00 w/ string RESTARTED after FATAL error) at its execution time 20 sec (second message with string 20000 ms). The third and fourth message tell that the error was caused by the OSAL handler and is due to stack overflow (see string ype osal_stackovf). The handler was called by a thread the I_RQHandler SVCall_ which is the one which does thread switching and the error was caused by last scheduled thread called runDO (it is the one which ticks all teh services at 1 kHz.

For the case of hw_HardFault handler, we also send the content of the CFSR register which can help detecting the cause of fault.

1068,141704 <INFO>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 781m 884u: 
(code 0x0000003b, par16 0x0000 par64 0x0000000000000000)
 -> SYS: the board is bootstrapping + . 

1068,141765 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 489u:
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + RESTARTED after FATAL error 

1068,141838 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 600u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e) -> 
DEBUG: tag00 + @ 1286766 ms 

 1068,141890 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 722u: 
 (code 0x04000000, par16 0x0000 par64 0x030364060013a26e) ->
 DEBUG: tag00 + handler hw_HardFault, code 0x64 

1068,151700 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 832u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + type see TBL 

1068,151839 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 783m 952u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
-> DEBUG: tag00 + IRQHan HardFault Thread tmrma 

1068,151902 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 784m 68u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + ipsr 3, tid 3 
 
1068,151902 <ERROR>  from BOARD 10.0.1.20 (head-eb20-j0_1), src LOCAL, adr 0, time 3s 784m 68u: 
(code 0x04000000, par16 0x0000 par64 0x030364060013a26e)
 -> DEBUG: tag00 + CFSR 0x00000000 

List. Board 10.0.1.20 has detected a fatal error of type hw_HardFault. In such a case we also transmit teh content of the CFSR register.

Tests

The resulting binaries have been extensively tested on a test bench with an ems board and also on the iCubGenova09 robot, which actually produced the messages emitted by board 10.0.1.20.

@marcoaccame marcoaccame marked this pull request as draft June 22, 2021 11:09
marcoaccame added a commit to robotology/icub-firmware-build that referenced this pull request Jun 23, 2021
* ems v3.42, mc4plus v3.36, mc2plus v3.26

- built on 2021 jun 23
- main changes from previous release are:
  - enabled the fatal error handler to force a restart and send useful information to yarprobotinterface (see robotology/icub-firmware#191)

* updated versions
@marcoaccame marcoaccame marked this pull request as ready for review June 23, 2021 08:27
@marcoaccame marcoaccame merged commit 17f1bdb into robotology:devel Jun 23, 2021
@marcoaccame marcoaccame deleted the feat/fatal-error-handler branch January 11, 2022 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant