Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sending local state to other participants #4558

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

danxuliu
Copy link
Member

@danxuliu danxuliu commented Dec 19, 2024

Fixes #3358

Requires #4536

First of all, sorry for the long, long delay and thanks a lot to everyone who provided information on the issue.

Note that the description below treats #4536 as part of this pull request (so the behaviours of the code are described in a pre-4536 state).

The problem started to happen in a0fa841 because, as described in the first part of #3358 (comment), the video is set as not available (or, rather, not known if available) when the connection state changes to NEW or CHECKING. This was done following the implementation of the WebUI; the idea is that until the connection is completly established it is not really known if the video is enabled or not, even if the stream has a video track (it could be a disabled video anyway, so the legacy behaviour to set it available if there is a track should be removed, but that is a different story), so the other client should explicitly provide that information once the connection is established (turns out that, although all that is correct with data channels, with signaling messages it would be possible to provide the state even before the peer connection is established, so this might need to be adjusted in the future). Although the change itself caused the video to no longer be shown (in some cases) the problem is in the sending side of the Android app.

As also described in the second part of #3358 (comment) sendInitialMediaStatus would not do anything when called on a subscriber peer connection, both because it does not have a local stream and because, even if it had, it is not the right connection to send data channel messages when using the HPB, as they must be sent in the publisher connection. But independently of that sendInitialMediaStatus will not be called anyway in most cases; it will not be called when the data channel is open, because for subscribers the observer is registered after the data channel was open already, and it will be randomly called when the connection state changes to CONNECTED, because it depends on hasInitiated, which is meant to be used to initiate the peer connection without HPB if the local participant session ID is "higher" than the remote participant session ID, but is not needed and, moreover, is wrong when using the external signaling server as in that case the comparison is made between the Nextcloud session ID and the signaling session ID.

In the case of publisher connections the data channel message will be also randomly sent when the connection state changes to CONNECTED due to the comparison between Nextcloud session ID and signaling session ID, although it will be always sent when the data channel is open. This means that if participant A is in a call and participant B joins then the state of participant A will not be sent to participant B (because it will be sent on the subscriber connection), but the state of participant B will be sent to participant A (because it will be sent on the publisher connection when the data channel is open). But then... why is the video of participant B visible for participant A only sometimes?

The reason is that when the HPB is used the connection is not established directly between both clients, but between the clients and the HPB. Therefore, when the data channel is open for participant B it is open between participant B and the HPB, but the data channel may not be open yet between participant A and the HPB. If the message is sent at this point it will reach the HPB, but it will not be relayed to participant A. Due to that reason the WebUI does not send just a single state message after the connection is established, but several ones with an exponential backoff to "ensure" that the state is received by the other participants even if their subscriber connection takes a while to be established.

But now the question is, why is video coming from the Android app always shown in the iOS app and the WebUI, despite all of the above?

The iOS app shows the video by default, so if a video track is sent by the Android app it is shown even if the Android app does not send a data channel message to enable it.

In the case of the WebUI received videos are disabled by default, but they are automatically enabled if it is detected that a video is being sent (which is a legacy behaviour and should not work like that, video should be shown only if explicitly enabled, but that is a different story). Therefore, again the video track sent by the Android app is shown even if the Android app does not send a data channel message to enable it.

In order to solve the issue, this pull request introduces the LocalStateBroadcasterXXX helper classes that take care of sending the local state to the other participants as needed. That is:

  • Send the current state to another participant in the call when that participant joins (if the local participant is the one that joined all the other participants joined from its point of view, so the state is sent to all of them); if the HPB is used the state is sent several times with an exponential backoff to solve the problem explained above about the connection not being established yet in the other end
  • Send the state changes to all the participants in the call; in this case a single message is sent
    • If the state is changed but a connection was not established yet the changed state should be eventually received through the initial exponential backoff (as it sends the current state, not the original state)

Before this pull request the state was sent only through data channels; now the state is sent through signaling messages too, as it is expected by other clients (in the past data channels were found to be problematic, but the signaling messages were more reliable and convenient, so that is why the state was moved to be sent also through signaling messages; the only exception is the speaking state, which, due to its potential frequency and being not so relevant, was left only as data channel messages to avoid hammering the database when the HPB is not used). Note that this should be backwards compatible with older Nextcloud versions, as if any signaling message was not handled it was just ignored, so there is no problem sending them.

The LocalStateBroadcaster (or, rather, its subclasses) also takes into account the differences between the internal or the external signaling server in order to send the messages using the appropriate connection.

All this is almost the same done in the WebUI, with some little improvements:

  • In the WebUI, when another participant joins the call, the message is sent to all participants rather than only to the participant that joined.
    • This is still the case in the Android app with Janus and data channels, but there is no way around that.
  • In the WebUI the state is sent again whenever the connection changes to the connected state. Therefore the state is sent again without need when a connection is temporarily interrupted (changes to disconnected and then back to connected). However, in the case of an ICE restart that would also send the state again, which would happen without HPB after the connection failed.
  • In the WebUI the state is not sent if the connection changes to the completed state; although typically the connection will go through the connected state before reaching the completed state it is possible to go directly from checking to completed, and in that case the state would not be sent.
  • In the WebUI, when the HPB is used, the state starts to be sent when a receiver connection is established with the other peer. Here the state is sent as soon as the remote participant is found, as the remote participant might already have a receiver connection to the local participant before the local participant has a receiver connection to the remote participant (although in most cases it should not make any difference, as establishing connections when the HPB is used is usually pretty fast).

Despite the improvements this approach is far from perfect and there is still an excessive amount of initial messages sent with the HPB due to the repeated sending. But this is tracked in nextcloud/spreed#8549 and it will need to be solved across all the clients at the same time.

Follow ups

  • Send nick and raised hand state in LocalStateBroadcaster
  • Send status also when no peer connection will be established (right now it is not a problem because only media state is sent, but it will be with non-media state, like the nick and raised hand)

How to test

  • Setup the HPB
  • Start a call with the Android app
  • Join the call with the Android app in another device

Result with this pull request

The video of the other participant is visible

Result without this pull request

The video of the other participant may or may not be visible (most likely it will not be)

@danxuliu danxuliu added bug Something isn't working 2. developing Work in progress feature: ☎️ call labels Dec 19, 2024
@danxuliu danxuliu force-pushed the fix-sending-local-state-to-other-participants branch 3 times, most recently from 2fd628b to d094dc5 Compare December 23, 2024 12:35
@danxuliu danxuliu added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Dec 26, 2024
@danxuliu
Copy link
Member Author

/backport to stable-20.1

"hasMCU" (which has always been the wrong name, because it is an SFU
rather than an MCU, but it is wrong even in the signaling server so for
now the legacy name is kept) was set again and again whenever the call
participant list changed. Now it is set instead once its value is known,
that is, when it is known that the internal signaling server is used (as
no "MCU" is used in that case), or when the connection with the external
signaling server is established, as its supported features are not known
until then.

This change should have no effect in the usages of "hasMCU", as it is
used when the call participant list change, which will happen only after
joining the call in the signaling server, or when sending "isSpeaking"
and toggling media, in both cases guarded by "isConnectionEstablished",
which will be true only once "performCall" was called or if the call is
active with other participants.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
For now it just provides support for sending a data channel message to
all participants, so notifying all participants when the media is
toggled or the speaking status change can be directly refactored to use
it.

While it would have been fine to use a single class for both MCU and no
MCU they were split for easier and cleaner unit testing in future
stages.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages are expected to be sent only to peer connections
with "video" type, which provide the audio and video tracks of the
participant (and, in fact, peer connections for screen shares do not
even have data channels enabled in the WebUI).

Note that this could change if at some point several audio/video tracks
are sent in the same peer connection, or if "speaking" messages are
added to screen shares, but that will be addressed if/when that happens.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This is the counterpart of CallParticipantModel for the local
participant. For now it just stores whether audio and video are enabled
or not, and whether the local participant is speaking or not, but it
will be eventually extended with further properties.

It is also expected that the views, like the button with the microphone
state, will update themselves based on the model. Similarly the model
should be moved from the CallActivity to a class similar to
CallParticipant but for the local participant. In any case, all that is
something for the future; the immediate use of the model will be to know
when the local state changes to notify other participants.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The LocalStateBroadcaster observes changes in the
LocalCallParticipantModel and notifies other participants in the call as
needed. Although it is created right before joining the call there is a
slim chance of the state changing before the local participant is
actually in the call, but even in that case other participants would not
be notified about the state due to the MessageSender depending on the
list of call participants / peer connections passed to it, which should
not be initialized before the local participant is actually in the call.

There is, however, a race condition that could cause participants to not
be added to the participant list if they join at the same time as the
local participant and a signaling message listing them but not the local
participant as in the call is received once the CallParticipantList was
created, but that is unrelated to the broadcaster and will be fixed
in another commit.

Currently only changes in the audio, speaking and video state are
notified, although in the future it should also notify about the nick,
the raised hand or any other state (but not one-time events, like
reactions). The notifications right now are sent only through data
channels, but at a later point they will be sent also through signaling
messages as needed.

Similarly, although right now it only notifies of changes in the state
it will also take care of notifying other participants about the current
state when they join the call (or the local participant joins).

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This is not possible when Janus is used, as Janus only allows
broadcasting data channel messages to all the subscribers of the
publisher connection.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Note that this implicitly send the current state to remote participants
when the local participant joins, as in that case all the remote
participants already in the call join from the point of view of the
local participant

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This will be used to have separate counts for data channel and signaling
messages.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The speaking state is still sent only through data channels, as it is
not currently handled by other clients when sent through signaling
messages.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
@danxuliu danxuliu force-pushed the fix-sending-local-state-to-other-participants branch from d094dc5 to 80e37f0 Compare January 7, 2025 01:56
Copy link
Contributor

github-actions bot commented Jan 7, 2025

APK file: https://www.kaminsky.me/nc-dev/android-artifacts/4558-talk.apk

qrcode

To test this change/fix you can simply download above APK file and install and test it in parallel to your existing Nextcloud Talk app.

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Codacy

Lint

TypemasterPR
Warnings158158
Errors7171

SpotBugs

CategoryBaseNew
Bad practice66
Correctness222222
Dodgy code7171
Internationalization33
Malicious code vulnerability33
Performance44
Security11
Total310310

@danxuliu danxuliu marked this pull request as ready for review January 7, 2025 02:23
@danxuliu danxuliu requested a review from mahibi January 7, 2025 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews backport-request bug Something isn't working feature: ☎️ call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Video calls using HPB shows no video from android app to android app
1 participant