Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the open_ack bug #601

Conversation

yellowhatter
Copy link
Contributor

We need to do init_transport_unicast with a new link only after sending open_ack on the link, otherwise some data may be scheduled through the transport before the open_ack, which causes Establish procedure to fail!

@yellowhatter
Copy link
Contributor Author

NOTE: I will review the multicast transport for the same bug, but there is no Establish procedure there, so it seems that the multicast is not affected by this

@Mallets
Copy link
Member

Mallets commented Nov 28, 2023

The problem of sending an OpenAck before finalizing the transport/link initialization is the following:

  • as of today, open returns an error if transport/link initialization fails
  • with this PR, open returns a ok and then a close is sent if transport/link initialization fails
    This changes the expected behaviour (as highlighted by the failing CI tests?).

@yellowhatter
Copy link
Contributor Author

yellowhatter commented Nov 28, 2023

The problem of sending an OpenAck before finalizing the transport/link initialization is the following:

  • as of today, open returns an error if transport/link initialization fails
  • with this PR, open returns a ok and then a close is sent if transport/link initialization fails
    This changes the expected behaviour (as highlighted by the failing CI tests?).

Yes, now I see that

@yellowhatter
Copy link
Contributor Author

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

@Mallets
Copy link
Member

Mallets commented Nov 28, 2023

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

Why do you think that is needed? As of today, if the level above the transport returns an error during initialization the OpenAck will not be sent.

@yellowhatter
Copy link
Contributor Author

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

Why do you think that is needed? As of today, if the level above the transport returns an error during initialization the OpenAck will not be sent.

Because otherwise we have a bug that causes intermittent tests randomly fail. The problem in initial code is that if we call init_transport_unicast before sending open_ack, the upper logic gets access to the recently-added transport through get_transports_unicast() and is able to schedule messages on that transport before open_ack. As a consequence, the other side of the connection gets application message instead of expected open_ack and fails to finalize Establish.....

@yellowhatter
Copy link
Contributor Author

I'm gonna refactor some things here to also avoid the problems you described

@Mallets
Copy link
Member

Mallets commented Nov 28, 2023

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

Why do you think that is needed? As of today, if the level above the transport returns an error during initialization the OpenAck will not be sent.

Because otherwise we have a bug that causes intermittent tests randomly fail. The problem in initial code is that if we call init_transport_unicast before sending open_ack, the upper logic gets access to the recently-added transport through get_transports_unicast() and is able to schedule messages on that transport before open_ack. As a consequence, the other side of the connection gets application message instead of expected open_ack and fails to finalize Establish.....

I think there is an error on the above explanation. The upper layer is notified in the finalize_transport and not in the init_transport_unicast. The finalize_transport is already called after sending the OpenAck.

@yellowhatter
Copy link
Contributor Author

yellowhatter commented Nov 28, 2023

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

Why do you think that is needed? As of today, if the level above the transport returns an error during initialization the OpenAck will not be sent.

Because otherwise we have a bug that causes intermittent tests randomly fail. The problem in initial code is that if we call init_transport_unicast before sending open_ack, the upper logic gets access to the recently-added transport through get_transports_unicast() and is able to schedule messages on that transport before open_ack. As a consequence, the other side of the connection gets application message instead of expected open_ack and fails to finalize Establish.....

I think there is an error on the above explanation. The upper layer is notified in the finalize_transport and not in the init_transport_unicast. The finalize_transport is already called after sending the OpenAck.

init_transport_unicast adds newly-created transport to self.state.unicast.transports here
If some code concurrently calls get_transport_unicast or get_transports_unicast, it will get access to the transport object concurrently with Establish's accept_link routine and has a chance to schedule some messages before open_ack will be scheduled. This is exactly what causes our CI to fail on transport_unicast_intermittent tests!
I think the safest way would be to avoid having semi-initialized transport with incomplete Establish at Manager's self.state.unicast.transports at any point of time.

@Mallets
Copy link
Member

Mallets commented Nov 28, 2023

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

Why do you think that is needed? As of today, if the level above the transport returns an error during initialization the OpenAck will not be sent.

Because otherwise we have a bug that causes intermittent tests randomly fail. The problem in initial code is that if we call init_transport_unicast before sending open_ack, the upper logic gets access to the recently-added transport through get_transports_unicast() and is able to schedule messages on that transport before open_ack. As a consequence, the other side of the connection gets application message instead of expected open_ack and fails to finalize Establish.....

I think there is an error on the above explanation. The upper layer is notified in the finalize_transport and not in the init_transport_unicast. The finalize_transport is already called after sending the OpenAck.

init_transport_unicast adds newly-created transport to self.state.unicast.transports here If some code concurrently calls get_transport_unicast or get_transports_unicast, it will get access to the transport object concurrently with Establish's accept_link routine and has a chance to schedule some messages before open_ack will be scheduled. This is exactly what causes our CI to fail on transport_unicast_intermittent tests! I think the safest way would be to avoid having semi-initialized transport with incomplete Establish at Manager's self.state.unicast.transports at any point of time.

The problem is that during Init/Open handshake some transport level parameters are exchanged. Every link coming from the same originator (e.g. peer) is required to include the same transport-level parameters for concurrency robustness. If that's not the case, the link establishment is considered invalid and the link is closed.

I hence believe we should have an intermediate state where we store transport parameters for already initialised transports. In this way, during link establishment, we can check the actual parameters and return an error if necessary.
Once the OpenAck is successfully sent, we can add the finalized transport to the final list in such a way get_transports_unicast operates as expected. What do you think?

@yellowhatter
Copy link
Contributor Author

Anyway, I wanna make a design that sends open_ack BEFORE making a new link available for levels above the transport

Why do you think that is needed? As of today, if the level above the transport returns an error during initialization the OpenAck will not be sent.

Because otherwise we have a bug that causes intermittent tests randomly fail. The problem in initial code is that if we call init_transport_unicast before sending open_ack, the upper logic gets access to the recently-added transport through get_transports_unicast() and is able to schedule messages on that transport before open_ack. As a consequence, the other side of the connection gets application message instead of expected open_ack and fails to finalize Establish.....

I think there is an error on the above explanation. The upper layer is notified in the finalize_transport and not in the init_transport_unicast. The finalize_transport is already called after sending the OpenAck.

init_transport_unicast adds newly-created transport to self.state.unicast.transports here If some code concurrently calls get_transport_unicast or get_transports_unicast, it will get access to the transport object concurrently with Establish's accept_link routine and has a chance to schedule some messages before open_ack will be scheduled. This is exactly what causes our CI to fail on transport_unicast_intermittent tests! I think the safest way would be to avoid having semi-initialized transport with incomplete Establish at Manager's self.state.unicast.transports at any point of time.

The problem is that during Init/Open handshake some transport level parameters are exchanged. Every link coming from the same originator (e.g. peer) is required to include the same transport-level parameters for concurrency robustness. If that's not the case, the link establishment is considered invalid and the link is closed.

I hence believe we should have an intermediate state where we store transport parameters for already initialised transports. In this way, during link establishment, we can check the actual parameters and return an error if necessary. Once the OpenAck is successfully sent, we can add the finalized transport to the final list in such a way get_transports_unicast operates as expected. What do you think?

Yes, you are correct, but the problem is somewhat deeper in the detail. init_transport_unicast has two branches inside:

  • create a new transport (that we discussed and your approach is good)
  • add a new link to already existing transport (multilink support)
    And for "add a new link" branch the problem with incomplete Establish is even harder: we take already existing, initialized and potentially in-use (get_transports_unicast) transport and add a link with incomplete Establish.... The same case. If we complete the Establish before calling add_link on the transport, there could also be a problem because add_link may fail due to some internal transport-implementation-defined checks - and here we are again run into Close after successful OpenAck.

So, speaking rustwise: "transport must consume the link object, ensuring that nobody will ever write a logic that intend to send/receieve something through the link object after it was passed to the transport" - I will say that we will get a good and safe
accept\open implementation if we manage to refactor it's logic with this restriction.

@Mallets
Copy link
Member

Mallets commented Nov 29, 2023

It sounds reasonable. What do you propose as next steps?

@yellowhatter
Copy link
Contributor Author

It sounds reasonable. What do you propose as next steps?

My proposition is: modify the transport trait in such a way that it would consume the link object in add_link (and in lowlatency transport new) and refactor a little accept/open to make everything work. As the result, we will get the code that is resistant to such kind of logical errors in the future - that is especially important for further access control and security development, as it would most likely touch the Establish procedure.

I'm already in progress on it.

@Mallets Mallets mentioned this pull request Nov 29, 2023
@Mallets
Copy link
Member

Mallets commented Nov 30, 2023

#585 has been merged, I think you can now rebase the changes on master.

@Mallets
Copy link
Member

Mallets commented Dec 11, 2023

A wider fix is being proposed in #610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants