Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detailed example of potential value of A/V/data sync in WebRTC #74

Open
darkvertex opened this issue Feb 2, 2022 · 1 comment
Open

Comments

@darkvertex
Copy link

[I originally wrote this in the W3C strategy repo and @dontcallmedom suggested I drop an issue here for greater visibility, so here goes...]

I'd like to share you a first hand VR-related use case where synced A/V/data could have been very useful at my job:

My team needed to deliver N concurrent synced video feeds from a multi-lens VR camera rig from a location with poor computational capacity (too low for live-stitching panoramic video onsite.) For reasons I cannot disclose we needed to livestream 360 video with a VR camera that wasn't able to livestream a stitched 360 video natively out of the box. The workaround we decided was to receive the individual feeds elsewhere with a more powerful computer and produce the stitched 360 monoscopic (ie non-3D) panoramic video to stream to whatever. (You can conceptualize each lens feed as an RTP/WebRTC video track.)

Our camera had 8 physical lenses horizontally but with just 4 we already had sufficient panoramic coverage so to save some bandwidth we only send 4.. but which 4? Depends what's visible and near what lens; maybe we want 4 even or 4 odd. We designed the sending software to let you pick a subset of the cameras. We can switch which are active on a whim, mid-stream.

One approach to dynamic feed switching in WebRTC could be to prenegotiate all the tracks possible that you could need and only send video on those that you consider active, but it's a little tricky to distinguish between a video feed being suddenly inactive because it was intentionally disabled due to a reconfiguration at the sender VS a video feed being suddenly inactive because there was some network congestion / data loss down the pipe. Renegotiating WebRTC video tracks between configuration switches is possible but we felt it interrupted the flow considerably as it added overhead, so we didn't go with it.

We needed the camera configuration and identity metadata to be timestamped with the video frames so it could correlate in perfect sync during reconfigurations. (Feed 1 may be showing camera 0, but maybe five seconds later it's showing camera 1, for example.) Identity matters for a realtime panoramic stitch because they are different perspectives in space and the algorithm must be kept informed or it'll look wonky. Unsynced changes are not useful as it glitches the result of the 360 processing and a WebRTC data channel (to my knowledge) could not do this with today's WebRTC generation.

We absolutely needed the camera identity to be in sync with the video frames. Since WebRTC data channels fell short, we simplified further down and settled on a pure RTP approach. We opted to hijack the outgoing H264 bitstream and inject NAL units of type SEI (Session Enhancement Information) Subtype 5, aka "unregistered user data" SEI , in before the frame image data in the RTP video track data. You can slip small amounts of userdata (text or json or whatever) in the video feed this way and not corrupt anything. All video players safely dismiss it.

On a custom software receiver (not a regular browser) you can recover the H264 packets from the RTP track, recover the original NAL units, read the metadata before a frame and your video processing can react accordingly and instantly since the metadata changes in perfect sync with the video frames.


If either data channels could be in sync with video without codec gymnastics OR if another convenient mechanism existed for a generic timestamped metadata stream, I think we may have stuck with WebRTC for our use case. (I personally would have liked that as it could have made it easier to debug things from a web app in-browser instead of some custom standalone software.)

Ultimately, data being in sync with video is important to any kind of "realtime actor" with a need for a status HUD, for example:

  • imagine a first-person-view flying drone web app where you can control it and there's a HUD overlay showing the live gyroscope data in perfect sync with the video,
  • or one of those creepy walking robot dogs and there's charts graphing the servo rotations overlayed on top and you can see exactly when one of them jams because you know it's in sync with the video showing you the same.

Sending device health and state information in perfect sync with the video feed is crucial for a trustworthy assessment of what's happening on screen of a remote entity. Being able to do this in an official and reliable capacity would be very exciting!


Sorry if I was a little verbose in my explanation. Hope it helps shed some light on why synced A/V/data could open the door to some very handy, exciting and useful in-browser use case scenarios!

@aboba
Copy link
Collaborator

aboba commented Feb 2, 2022

Thank you for the detailed explanation.

The existing AR/VAR use case is currently not enabled by WebRTC-encoded transform due to inability to control packetization. For example, in the existing API, if the additional data is too large, the video packet might exceed the MTU size, or the congestion control algorithm could malfunction. So we could add that to the requirements for the use case.

Another avenue toward addressing your use case might be via WebCodecs, along with a transport such as RTCDataChannel (P2P) or WebTransport (C/S). Although neither of those transport is optimized for realtime communications, where the media flow primarily from server -> client, the server-side implementation can implement a realtime congestion control algorithm such as Google CC or SCREAM. However, if communications is bi-directional, then this approach won't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants