Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section 3.10.2: Introducing app content-decisions at the encoded video level isn't webby #101

Open
jan-ivar opened this issue Jan 24, 2023 · 2 comments

Comments

@jan-ivar
Copy link
Member

§ 3.10.2 Transmitting stored encoded media ...

  • "Wait signals" or "your message is important to us, please stay on the line", inserted prior to switching to a live interactive stream.
  • Insertion of announcements or alarm signals in otherwise live streams.
  • Insertion of static content (such as profile pictures) when the sender temporarily disables a camera.

These kinds of features already exist today, and are implemented client-side using existing web technology.

Moving such app decisions to the sender, seems like a step backwards, reminiscent of dial-tone and DTMF, whose justifications were non-web-client recipients (and from another era).

Even if we can produce compelling use cases that hinge on video being switched out sender-side, we have replaceTrack for that already, and VideoTrackGenerator to bring in other sources of media. Operating on decoded media is simple, well-supported and allows local playback (self-view) of what is being sent. Re-encoding also ensures uniformity and scalability (e.g. SVC encoding).

Moving this logic to the encoded layer would seem to require significant API changes, which don't seem justified merely to optimize out a decoding/re-encoding step from time to time.

E.g. I can imagine a use case of an online teacher wishing the audience to see a training video instead of the teacher, but in this case, the better app IMHO is the one that sends the video using state-of-the-art tech for this (e.g. MSE, giving audience members/end-users the ability to pause/resume), not the one that encodes it into the WebRTC camera stream to simplify the life for the web developer dealing with a unified RTCPeerConnection API, at a cost to end-users.

§ 3.10.3 Decoding pre-encoded media ... we have pre-encoded media (either dynamically generated or stored) that we wish to process in the same way as one processes media coming in over a PeerConnection

The "we" here seems to refer to web developers, not end-users. It therefore infers no benefits to end-users, and is not an acceptable use case to me. Moreover, even for web developers, it's not clear what benefits, if any, come from treating non-WebRTC media as WebRTC media. An RTCPeerConnection outputs a MediaStreamTrack which seems a versatile enough integration point.

@alvestrand
Copy link
Contributor

This argument seems to hinge on the idea that non-encoded media is as easy to handle as encoded media for the "insertion" cases.

Non-encoded media is an order of magnitude larger than encoded media, and the transformation takes significant cycles.

We know that people are bothered by the heat generated by our current conferencing applications. This use case is predicated on the theory that saving encode/decode cycles is a Good Thing.

Illogical point: Web developers don't have relevant pre-encoded media (except for testing). The users have media. So the last paragraph makes no sense to me.

@aboba aboba changed the title Introducing app content-decisions at the encoded video level isn't webby Section 3.10.2: Introducing app content-decisions at the encoded video level isn't webby May 15, 2023
@dontcallmedom-bot
Copy link

This issue was mentioned in WEBRTCWG-2023-05-16 (Page 15)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants