You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Later, there is a requirement N23: The user agent must be able to send data synchronized with audio and video., however, I don't think that covers the support required to handle captioning, transcription, and language translation. The receiving side must be able to interpret the data as human-readable text, which implies the format of the data should be further standardized.
I propose that the doc explicitly states that these use cases require sending/receiving human-readable text in parallel with other media such that a received WebRTC stream directly connected to an HTMLMediaElement will have textTracks representing the sent text. It should also state that the text tracks can be generated and processed similarly to other raw media streams in requirement N19: The application must be able to insert processed frames into the outgoing media path.
If this is not standardized, supporting these accessibility-enhancing features becomes much more difficult. Applications must invent a protocol for the text tracks and include code to encode/decode them to/from data channels, transforming them into calls to TextTrack.addCue. More likely, we'll see so-called "open captioning" where the text is rendered onto the video frames. Open captioning makes it impossible for users to adjust the format, size, location, etc. of the captions based on their needs, makes it impossible for the browser to automatically translate the captions to the user's language, and potentially covers/hides important information in the video. Open captioning also doesn't work well for users who have difficulty both hearing and seeing.
Additionally, for the language translation use case, we should consider supporting the kind and language categorizations for WebRTC audio and video tracks.
The text was updated successfully, but these errors were encountered:
there seem to be standards in this space such as GETlivecap, altho this may appear a bit dated today in its use of polling over say websockets.
the mailing list also mentioned WebVTT, tho that seems less made for live captioning.
The WebRTC Next Version Use Cases doc lists three use-cases under Funny hats related to plain-text associated with media: Captioning, Transcription, and Language translation. However, there do not seem to be any new requirements listed related to handling the human-readable plain-text generated or manipulated in these use cases.
Later, there is a requirement N23: The user agent must be able to send data synchronized with audio and video., however, I don't think that covers the support required to handle captioning, transcription, and language translation. The receiving side must be able to interpret the data as human-readable text, which implies the format of the data should be further standardized.
I propose that the doc explicitly states that these use cases require sending/receiving human-readable text in parallel with other media such that a received WebRTC stream directly connected to an HTMLMediaElement will have textTracks representing the sent text. It should also state that the text tracks can be generated and processed similarly to other raw media streams in requirement N19: The application must be able to insert processed frames into the outgoing media path.
If this is not standardized, supporting these accessibility-enhancing features becomes much more difficult. Applications must invent a protocol for the text tracks and include code to encode/decode them to/from data channels, transforming them into calls to TextTrack.addCue. More likely, we'll see so-called "open captioning" where the text is rendered onto the video frames. Open captioning makes it impossible for users to adjust the format, size, location, etc. of the captions based on their needs, makes it impossible for the browser to automatically translate the captions to the user's language, and potentially covers/hides important information in the video. Open captioning also doesn't work well for users who have difficulty both hearing and seeing.
Additionally, for the language translation use case, we should consider supporting the kind and language categorizations for WebRTC audio and video tracks.
The text was updated successfully, but these errors were encountered: