standardized support for captions/subtitles #70

dogben · 2021-02-24T16:25:43Z

The WebRTC Next Version Use Cases doc lists three use-cases under Funny hats related to plain-text associated with media: Captioning, Transcription, and Language translation. However, there do not seem to be any new requirements listed related to handling the human-readable plain-text generated or manipulated in these use cases.

Later, there is a requirement N23: The user agent must be able to send data synchronized with audio and video., however, I don't think that covers the support required to handle captioning, transcription, and language translation. The receiving side must be able to interpret the data as human-readable text, which implies the format of the data should be further standardized.

I propose that the doc explicitly states that these use cases require sending/receiving human-readable text in parallel with other media such that a received WebRTC stream directly connected to an HTMLMediaElement will have textTracks representing the sent text. It should also state that the text tracks can be generated and processed similarly to other raw media streams in requirement N19: The application must be able to insert processed frames into the outgoing media path.

If this is not standardized, supporting these accessibility-enhancing features becomes much more difficult. Applications must invent a protocol for the text tracks and include code to encode/decode them to/from data channels, transforming them into calls to TextTrack.addCue. More likely, we'll see so-called "open captioning" where the text is rendered onto the video frames. Open captioning makes it impossible for users to adjust the format, size, location, etc. of the captions based on their needs, makes it impossible for the browser to automatically translate the captions to the user's language, and potentially covers/hides important information in the video. Open captioning also doesn't work well for users who have difficulty both hearing and seeing.

Additionally, for the language translation use case, we should consider supporting the kind and language categorizations for WebRTC audio and video tracks.

fippo · 2021-03-19T11:08:25Z

see also https://blog.google/products/chrome/live-caption-chrome :-)
It would be good to have programmatic access to those results. But how would that look like:

const stream = await navigator.mediaDevices.getUserMedia({audio: {captions: true}})

which then returns a stream with two tracks - one audio, the other text.
Or would the transcription be something you can request from the audio track?

KiaraGrouwstra · 2023-02-06T11:57:11Z

there seem to be standards in this space such as GETlivecap, altho this may appear a bit dated today in its use of polling over say websockets.
the mailing list also mentioned WebVTT, tho that seems less made for live captioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standardized support for captions/subtitles #70

standardized support for captions/subtitles #70

dogben commented Feb 24, 2021

fippo commented Mar 19, 2021

KiaraGrouwstra commented Feb 6, 2023

standardized support for captions/subtitles #70

standardized support for captions/subtitles #70

Comments

dogben commented Feb 24, 2021

fippo commented Mar 19, 2021

KiaraGrouwstra commented Feb 6, 2023