Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: receiving fragmented message gets stuck SOFIE-2680 #188

Merged
merged 9 commits into from
Oct 3, 2023

Conversation

Julusian
Copy link
Member

@Julusian Julusian commented Sep 28, 2023

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

unit tests for a couple of bugs

  • What is the current behavior? (You can also link to an open issue here)

The receiving logic is not very durable, and can be broken in a couple of ways:

  1. A response gets broken over multiple TCP packets. Fragmentation like this is normal TCP behaviour, especially when using INFO 1 on a channel with many active layers
  2. When a deserializer fails, it crashes the parsing loop, and the connection gets stuck attempting to reparse the same message
  3. There is a race condition in the parsing, if a second packet is received before a deserializer finishes, it will get parsed twice.
  • What is the new behavior?

This fixes the known failures around fragmented or invalid responses.

There is a behavioural change here, previously an invalid response would result in a connection error and the command timing out. Now, an invalid response will resolve the command promise with a 500 error.

_unprocessedLines and _unprocessedData is discarded upon the connection opening/closing, otherwise an incomplete fragment could cause the first response after reconnecting to get lost.

  • Other information:
    Also, it looks like _unprocessedLines is not cleared when the connection opens. I suspect this will result in TSR triggering a reconnection due to the timeout/error, but the same broken responses will be left in the queue, causing all future commands to also timeout until the device is restarted. I have not attempted to confirm this suspicion

I have tested this change with a script set to fire many commands at caspar, and is ensuring that it gets a successful response back.
Prior to this change running INFO 2 would often time out, due to the reply being split across tcp packets. It is now successfully getting back 201 each time

@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (4b1bc55) 67.81% compared to head (82a398d) 79.60%.
Report is 5 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #188       +/-   ##
===========================================
+ Coverage   67.81%   79.60%   +11.78%     
===========================================
  Files           5        6        +1     
  Lines         435      554      +119     
  Branches       82      101       +19     
===========================================
+ Hits          295      441      +146     
+ Misses        129      109       -20     
+ Partials       11        4        -7     
Files Coverage Δ
src/api.ts 61.38% <100.00%> (ø)
src/connection.ts 84.12% <83.87%> (+59.12%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

If there is any data there, it will be incomplete, with the remainder lost due to the broken connection
@Julusian Julusian marked this pull request as ready for review September 29, 2023 13:27
@Julusian Julusian changed the title chore: add test to check receiving fragmented message chore: add test to check receiving fragmented message SOFIE-2680 Sep 29, 2023
Copy link
Contributor

@mint-dewit mint-dewit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement fixing some naive assumptions from my side. Couple of notes:

  • I'm not a fan of the pattern with setTimeout. I'd rather duplicate line 200 containing the splice than adding many levels of indentation. There is something to say for parsing the responses asynchronously though but then maybe it should be separated into an async private function.
  • I'm not a fan of overwriting the response code in case of a parsing error. I would like for this library to be as transparent as possible. I think returning a different data type containing an error would be more suitable. Preferably something that can easily be checked for by a library user. Maybe one could even argue that invalid data is the same as no data?
  • Perhaps the unit tests can be simplified a bit by passing around callbacks with the test logic to a function that sets up and tears down the connection.

@Julusian Julusian changed the title chore: add test to check receiving fragmented message SOFIE-2680 fix: receiving fragmented message gets stuck SOFIE-2680 Oct 3, 2023
@Julusian
Copy link
Member Author

Julusian commented Oct 3, 2023

@mint-dewit

I'm not a fan of the pattern with setTimeout. I'd rather duplicate line 200 containing the splice than adding many levels of indentation. There is something to say for parsing the responses asynchronously though but then maybe it should be separated into an async private function.

Yeah, the setImmediate wasn't necessary for the behaviour I was after because of the Promise wrapping, so that has been dropped with that promise pulled out into its own function.

I'm not a fan of overwriting the response code in case of a parsing error. I would like for this library to be as transparent as possible. I think returning a different data type containing an error would be more suitable. Preferably something that can easily be checked for by a library user. Maybe one could even argue that invalid data is the same as no data?

Good point. I've reworked it so that instead of a mangled response, the request promise will now reject with a custom Error type (which contains the deserialize error and the original response)

Perhaps the unit tests can be simplified a bit by passing around callbacks with the test logic to a function that sets up and tears down the connection.

I've done a little bit of refactoring, I am not sure what further to do currently

@Julusian Julusian requested a review from mint-dewit October 3, 2023 09:50
@Julusian Julusian merged commit 15739ae into master Oct 3, 2023
10 checks passed
@Julusian Julusian deleted the bug/message-fragmentation branch October 3, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants