You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We noticed strange behavior where devices (in our case, using FSK) would randomly drop out, after hours, days or weeks of running fine.
We think we found the root cause now: duringSX1276SetOpMode() in SX1276SetTx() a Dio1 IRQ may come in already and the next chunk of data is written to the FIFO from the IRQ handler. To this end, 0x80 is emitted on SPI and then typically the second chunk of data; the first chunk is already emitted in SX1276Send().
If the timing is exactly right, the FIFO SPI transfer effectively becomes part of the SX1276Write() SPI transaction that writes to register 0x01, resulting in the combination of 0x80 + data chunk being written over registers 0x02 and further. This then is such an odd configuration (e.g. for channel and bitrate) that it is only downhill from there, and possibly other issues arise as a symptom.
We were able to verify the described behavior by looking at the actual register contents in the defunct state, and comparing this to the message that was just submitted and still present in the Tx buffer.
As our driver is based on a fork of LoRaMac-node, I suspect that the issue we uncovered is also an issue in LoRaMac-node. A simple solution is to disable interrupts during SX1276SetOpMode() in SX1276SetTx(). A pull request to this end is submitted.
Note that this kind of bug is highly dependent on exact timing, and therefore probably also in various ways on channel, bitrate, platform, voltages, frequencies, among other things, and reproducing it is quite hard. So providing a test case is nearly impossible, unless one would add some delays here and there to widen the window of opportunity for this to happen, while increasing the traffic volume.
This may be related to various other incidental reports of register corruption. This may also be an equivalent issue for SX1272 whose relevant code is identical.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We noticed strange behavior where devices (in our case, using FSK) would randomly drop out, after hours, days or weeks of running fine.
We think we found the root cause now: during
SX1276SetOpMode()
inSX1276SetTx()
a Dio1 IRQ may come in already and the next chunk of data is written to the FIFO from the IRQ handler. To this end, 0x80 is emitted on SPI and then typically the second chunk of data; the first chunk is already emitted inSX1276Send()
.If the timing is exactly right, the FIFO SPI transfer effectively becomes part of the
SX1276Write()
SPI transaction that writes to register 0x01, resulting in the combination of 0x80 + data chunk being written over registers 0x02 and further. This then is such an odd configuration (e.g. for channel and bitrate) that it is only downhill from there, and possibly other issues arise as a symptom.We were able to verify the described behavior by looking at the actual register contents in the defunct state, and comparing this to the message that was just submitted and still present in the Tx buffer.
As our driver is based on a fork of LoRaMac-node, I suspect that the issue we uncovered is also an issue in LoRaMac-node. A simple solution is to disable interrupts during
SX1276SetOpMode()
inSX1276SetTx()
. A pull request to this end is submitted.Note that this kind of bug is highly dependent on exact timing, and therefore probably also in various ways on channel, bitrate, platform, voltages, frequencies, among other things, and reproducing it is quite hard. So providing a test case is nearly impossible, unless one would add some delays here and there to widen the window of opportunity for this to happen, while increasing the traffic volume.
This may be related to various other incidental reports of register corruption. This may also be an equivalent issue for SX1272 whose relevant code is identical.
Beta Was this translation helpful? Give feedback.
All reactions