index.html

<!doctype html>
<html lang="en-us">
<head>
  <title>Media Capture and Streams Extensions</title>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  <script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
  <script class='remove'>
  "use strict";
  // See https://github.com/w3c/respec/wiki/ for how to configure ReSpec
  var respecConfig = {
    group: "webrtc",
    xref: ["html", "infra", "permissions", "dom", "mediacapture-streams", "webaudio", "webidl"],
    edDraftURI: "https://w3c.github.io/mediacapture-extensions/",
    editors:  [
      {name: "Jan-Ivar Bruaroey", company: "Mozilla Corporation", w3cid: 79152},
    ],
    shortName: "mediacapture-extensions",
    specStatus: "ED",
    subjectPrefix: "[mediacapture-extensions]",
    github: "https://github.com/w3c/mediacapture-extensions",
  };
  </script>
</head>

<body>
  <section id="abstract">
    <p>This document defines a set of ECMAScript APIs in WebIDL to extend the [[mediacapture-streams]] specification.</p>
  </section>
  <section id="sotd">
    <p>This is an unofficial proposal.</p>
  </section>
  <section id="introduction">
    <h2>Introduction</h2>
    <p>This document contains proposed extensions and modifications to the
    [[mediacapture-streams]] specification.</p>
    <p>New features and modifications to existing features proposed here may be
    considered for addition into the main specification post Recommendation.
    Deciding factors will include maturity of the extension or modification,
    consensus on adding it, and implementation experience.</p>
    <p>A concrete long-term goal is reducing the fingerprinting surface of
    {{MediaDevices/enumerateDevices()}} by deprecating exposure of the device
    {{MediaDeviceInfo/label}} in its results. This requires relieving
    applications of the burden of building user interfaces to select cameras and
    microphones in-content, by offering this in user agents as part of
    {{MediaDevices/getUserMedia()}} instead.</p>
    <p>Miscellaneous other smaller features are under consideration as well,
    such as constraints to control multi-channel audio beyond stereo.</p>
  </section>
  <section>
    <h2>Terminology</h2>
    <p>
      This document uses the definitions {{MediaDevices}}, {{MediaStreamTrack}},
      {{MediaStreamConstraints}} and {{ConstrainablePattern}}
      from [[!mediacapture-streams]].
      <p>The terms [=permission state=], [=request permission to use=], and
      <a data-cite="permissions">prompt the user to choose</a> are defined in
      [[!permissions]].</p>    </p>
  </section>
  <section id="conformance">
  </section>
  <section id="camera and microphone picker">
    <h2>In-browser camera and microphone picker</h2>
    <p>The existing {{MediaDevices/enumerateDevices()}} function exposes camera
    and microphone {{MediaDeviceInfo/label}}s to let applications build
    in-content user interfaces for camera and microphone selection. Applications
    have had to do this because {{MediaDevices/getUserMedia()}} did not offer a
    web compatible in-agent device picker. This specification aims to rectify
    that.</p>
    <p>Due to the significant fingerprinting vector caused by device
    {{MediaDeviceInfo/label}}s, and the well-established nature of the existing
    APIs, the scope of this particular effort is limited to removing
    {{MediaDeviceInfo/label}}, leaving the overall constraints-based model
    intact. This helps ensure a migration path more viable than to a
    less-powerful API.</p>
    <p>This specification augments the existing {{MediaDevices/getUserMedia()}}
    function instead of introducing a new less-powerful API to compete with it,
    for that reason as well.</p>
    <section id="new getusermedia semantics">
      <h3>getUserMedia "user-chooses" semantics</h3>
      <p>This specification introduces
      slightly altered semantics to the {{MediaDevices/getUserMedia()}}
      function called <code>"user-chooses"</code> that guarantee a picker will
      be shown to the user in cases where the user agent would otherwise choose
      for the user (that is: when application constraints do not narrow down
      the choices to a single device). This is orthoginal to permission, and
      offers a better and more consistent user experience across applications
      and user agents.
      </p>
      <p>Unfortunately, since the <code>"user-chooses"</code> semantics may
      produce user agent prompts at different times and in different situations
      compared to the old semantics, they are somewhat incompatible with
      expectations in some existing web applications that tend to call
      {{MediaDevices/getUserMedia()}} repeatedly and lazily instead of using
      e.g. <code>stream.clone()</code>.</p>
    </section>
    <section id="web compatibility">
      <h3>Web compatibility and migration</h3>
      <p>User agents are encouraged to provide the new semantics as opt-in
      initially for web compatibility. User agents MUST deprecate (remove)
      {{MediaDeviceInfo/label}} from {{MediaDeviceInfo}} over time, though specific migration strategies
      are left to user agents. User agents SHOULD migrate to offering the new
      semantics by default (opt-out) over time.</p>
      <p>Since the constraints-model remains intact, web compatibility problems
      are expected to be limited to:</p>
      <ul>
        <li>
          Sites that never migrated show e.g. "Camera 1", "Camera 2" etc.
          instead of descriptive device labels
        </li>
        <li>
          Sites with no device management strategy provoke a picker in the
          user agent every visit for users with more than a singular choice
          of camera or microphone (a feature of sorts)
        </li>
      </ul>
    </section>
    <section id="mediadevices-interface">
      <h3>MediaDevices Interface Extensions</h3>
      <div>
        <pre class="idl"
>partial interface MediaDevices {
  readonly attribute GetUserMediaSemantics defaultSemantics;
};</pre>
      </div>
      <section>
        <h2>Attributes</h2>
        <dl data-link-for="MediaDevices" data-dfn-for="MediaDevices"
        class="attributes">
          <dt id="def-mediadevices-defaultSemantics"><dfn><code>defaultSemantics</code></dfn>
          of type <span class="idlAttrType"><a>GetUserMediaSemantics</a></span>, readonly</dt>
          <dd>
            <p>The default semantics of {{MediaDevices/getUserMedia()}} in this
            user agent.</p>
            <p>User agents SHOULD default to <code>"browser-chooses"</code>
            for backwards compatibility, until a transition plan has been
            enacted where a majority of user agents collectively switch their
            defaults to <code>"user-chooses"</code> for improved user privacy,
            and usage metrics suggest this transition is feasible without
            major breakage.</p>
          </dd>
        </dl>
      </section>
    </section>
    <section id="mediastreamconstraints-dictionary-extensions">
      <h3>MediaStreamConstraints dictionary extensions</h3>
      <div>
        <pre class="idl"
>partial dictionary MediaStreamConstraints {
  GetUserMediaSemantics semantics;
};</pre>
        <section>
          <h2>Dictionary {{MediaStreamConstraints}} Members</h2>
          <dl data-link-for="MediaStreamConstraints" data-dfn-for=
          "MediaStreamConstraints" class="dictionary-members">
            <dt><dfn><code>semantics</code></dfn> of type <span class=
            "idlMemberType">{{GetUserMediaSemantics}}</span></dt>
            <dd>
              <p>In cases where the specified constraints do not narrow
              multiple choices between devices down to one per kind, specifies
              how the final determination of which devices to pick from the
              remaining choices MUST be made. If not specified, then the
              <a data-link-for="MediaDevices">defaultSemantics</a> are used.
              </p>
            </dd>
          </dl>
        </section>
      </div>
    </section>
    <section id="getusermediasemantics-enum">
      <h3>GetUserMediaSemantics enum</h3>
      <div>
        <pre class="idl"
>enum GetUserMediaSemantics {
  "browser-chooses",
  "user-chooses"
};</pre>
        <table data-link-for="GetUserMediaSemantics" data-dfn-for=
        "GetUserMediaSemantics" class="simple">
          <tbody>
            <tr>
              <th colspan="2"><dfn>GetUserMediaSemantics</dfn> Enumeration
              description</th>
            </tr>
            <tr>
              <td><dfn><code id=
              "idl-def-GetUserMediaSemantics.browser-chooses">browser-chooses</code></dfn></td>
              <td>
                <p>When application-specified constraints do not narrow multiple
                choices between devices down to one per kind, the user agent is
                allowed to make the final determination between the remaining
                choices.
                </p>
              </td>
            </tr>
            <tr>
              <td><dfn><code id=
              "idl-def-GetUserMediaSemantics.user-chooses">user-chooses</code></dfn></td>
              <td>
                <p>When application-specified constraints do not narrow
                multiple choices between devices down to one per kind, the user
                agent MUST
                <a href="prompt-the-user-to-choose">prompt the user to choose</a>
                between the remaining choices, even if the application already
                has permission to some or all of them.</p>
              </td>
            </tr>
          </tbody>
        </table>
      </div>
    </section>
    <section>
      <h2>Algorithms</h2>
      <p>When the {{MediaDevices/getUserMedia()}} method is invoked, run the
      following steps before invoking the {{MediaDevices/getUserMedia()}}
      algorithm:</p>
      <ol>
          <li>
            <p>Let <var>mediaDevices</var> be the object on which this method was
            invoked.</p>
          </li>
          <li>
            <p>Let <var>constraints</var> be the method's first argument.</p>
          </li>
          <li>
            <p>Let <var>semanticsPresent</var> be <code>true</code> if
            <var>constraints</var><code>.semantics</code> [= map/exists =],
            otherwise <code>false</code>.</p>
          </li>
          <li>
            <p>Let <var>semantics</var> be
            <var>constraints</var><code>.semantics</code>
            if <a href="https://heycam.github.io/webidl/#dfn-present">present</a>,
            or the value of <var>mediaDevices</var><code>.<a data-link-for="MediaDevices">defaultSemantics</a></code>
            otherwise.</p>
          </li>
          <li>
            <p>Replace step 6.5.1. of the {{MediaDevices/getUserMedia()}}
            algorithm in its entirety with the following two steps:</p>
            <ol>
              <li>
                <p>Let <var>descriptor</var> be a {{PermissionDescriptor}}
                with its {{PermissionDescriptor/name}} member set to the permission name
                associated with <var>kind</var> (e.g. {{PermissionName/"camera"}} for
                <code>"video"</code>, {{PermissionName/"microphone"}} for <code>"audio"</code>), and,
                optionally, consider its {{DevicePermissionDescriptor/deviceId}} member set to any appropriate
                device's <var>deviceId</var>.</p>
              </li>
              <li>
                <p>If the number of unique devices sourcing tracks of
                media type <var>kind</var> in <var>candidateSet</var>
                is greater than <code>1</code> and
                <var>semantics</var> is <code>"user-chooses"</code>,
                then <a>prompt the user to choose</a> a device with
                <var>descriptor</var>, resulting in provided media.
                Otherwise, <a>request permission to use</a> a
                device with <var>descriptor</var>, while considering
                all devices being attached to a live and
                <a>same-permission</a> MediaStreamTrack in the current
                [=browsing
                context=] to mean having permission status {{PermissionState/"granted"}},
                resulting in provided media.</p>
                <p><dfn>Same-permission</dfn> in this context means a
                {{MediaStreamTrack}} that required the same level of
                permission to obtain as what is being requested.</p>
                <p>When asking the user’s permission, the user agent
                MUST disclose whether permission will be granted only to
                the device chosen, or to all devices of that
                <var>kind</var>.</p>
                <p>Let <var>track</var> be the provided media, which
                MUST be precisely one track of type <var>kind</var> from
                <var>finalSet</var>. If <var>semantics</var> is
                <code>"browser-chooses"</code> then the decision of
                which track to choose from <var>finalSet</var> is up
                to the User Agent, which MAY use the value of the computed
                "fitness distance" from the <a href=
                "https://www.w3.org/TR/mediacapture-streams/#dfn-selectsettings">
                SelectSettings</a>
                algorithm, the value of <var>semanticsPresent</var>,
                or any other internally-available information about
                the devices, as inputs to its decision.
                If <var>semantics</var> is <code>"user-chooses"</code>,
                and the application has not narrowed down the choices
                to one, then the user agent MUST ask the user to make
                the final selection.</p>
                <p>Once selected, the source of the
                {{MediaStreamTrack}} MUST NOT change.</p>
                <p>User Agents are encouraged to default to or present
                a default choice based primarily on fitness distance,
                and secondarily on the user's primary or system default
                device for <var>kind</var> (when possible). User Agents
                MAY allow users to use any media source, including
                pre-recorded media files.</p>
              </li>
            </ol>
          </li>
      </ol>
    </section>
    <section>
      <h2>Examples</h2>
      <div>
        <p>This example shows a setup with a start button and a camera selector
        using the new semantics (microphone is not shown for brievity but is
        equivalent).</p>
        <pre class="example">
&lt;button id="start"&gt;Start&lt;/button&gt;
&lt;button id="chosenCamera" disabled&gt;Camera: none&lt;/button&gt;
&lt;script&gt;

let cameraTrack = null;

start.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: {deviceId: localStorage.cameraId}
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

chosenCamera.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: true,
      semantics: "user-chooses"
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

function setCameraTrack(track) {
  cameraTrack = track;
  const {deviceId, label} = track.getSettings();
  localStorage.cameraId = deviceId;
  chosenCamera.innerText = `Camera: ${label}`;
  chosenCamera.disabled = false;
}
&lt;/script&gt;
        </pre>
      </div>
    </section>
  </section>
  <section>
    <h2>Transferable MediaStreamTrack</h2>
    <div>
      <p>A {{MediaStreamTrack}} is a <a data-cite="!HTML/#transferable-objects">transferable object</a>.
      This allows manipulating real-time media outside the context it was requested or created in,
      for instance in workers or third-party iframes.</p>
      <p>To preserve the existing privacy and security infrastructure, in particular for capture tracks,
      the track source lifetime management remains tied to the context that created it.
      The transfer algorithm MUST ensure the following behaviors:</p>
      <p>
        <ol>
          <li><p>The context named <var>originalContext</var> that created a track named <var>originalTrack</var> remains in control
          of the <var>originalTrack</var> source, named <var>trackSource</var>, even when <var>originalTrack</var> is transferred into <var>transferredTrack</var>.</p>
          </li>
          <li>
            <p>In particular, <var>originalContext</var> remains the proxy to privacy indicators of <var>trackSource</var>.
            <var>transferredTrack</var> or any of its clones are considered as tracks using <var>trackSource</var>
            as if they were tracks created in and controlled by <var>originalContext</var>.</p>
          </li>
          <li><p>When <var>originalContext</var> goes away, <var>trackSource</var> gets ended, thus <var>transferredTrack</var> gets ended.</p></li>
          <li><p>When <var>originalContext</var> would have muted/unmuted <var>originalTrack</var>, <var>transferredTrack</var> gets muted/unmuted.</p></li>
          <li><p>If <var>transferredTrack</var> is cloned in <var>transferredTrackClone</var>, <var>transferredTrackClone</var> is tied to <var>trackSource</var>.
          It is not tied to <var>originalTrack</var> in any way.</p></li>
          <li><p>If <var>transferredTrack</var> is transferred into <var>transferredAgainTrack</var>, <var>transferredAgainTrack</var> is tied to <var>trackSource</var>.
          It is not tied to <var>transferredTrack</var> or <var>originalTrack</var> in any way.</p></li>
        </ol>
      </p>
    </div>
    <div>
      <p>The WebIDL changes are the following:
      <pre class="idl"
>[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};</pre>
    </div>
    <div>
      <p>At creation of a {{MediaStreamTrack}} object, called <var>track</var>, run the following steps:</p>
      <ol>
        <li><p>Initialize <var>track</var>.`[[IsDetached]]` to <code>false</code>.</p></li>
      </ol>
    </div>
    <div>
      <p>The {{MediaStreamTrack}} <a data-cite="!HTML/#transfer-steps">transfer steps</a>, given <var>value</var> and <var>dataHolder</var>, are:</p>
      <ol>
        <li><p>If <var>value</var>.`[[IsDetached]]` is <code>true</code>, throw a "DataCloneError" DOMException.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[id]]` to <var>value</var>.{{MediaStreamTrack/id}}.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[kind]]` to <var>value</var>.{{MediaStreamTrack/kind}}.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[label]]` to <var>value</var>.{{MediaStreamTrack/label}}.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[readyState]]` to <var>value</var>.{{MediaStreamTrack/readyState}}.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[enabled]]` to <var>value</var>.{{MediaStreamTrack/enabled}}.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[muted]]` to <var>value</var>.{{MediaStreamTrack/muted}}.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[source]]` to <var>value</var> underlying source.</p></li>
        <li><p>Set <var>dataHolder</var>.`[[constraints]]` to <var>value</var> active constraints.</p></li>
        <li><p>Set <var>value</var>.`[[IsDetached]]` to <code>true</code>.</p></li>
        <li><p>Set <var>value</var>.{{MediaStreamTrack/[[ReadyState]]}} to <a data-cite="!mediacapture-streams/#track-ended">"ended"</a> (without stopping the underlying source or firing an `ended` event).</p></li>
      </ol>
    </div>
    <div><p>{{MediaStreamTrack}} <a data-cite="!HTML/#transfer-receiving-steps">transfer-receiving steps</a>, given <var>dataHolder</var> and <var>track</var>, are:</p>
      <ol>
        <li><p>Initialize <var>track</var>.{{MediaStreamTrack/id}} to <var>dataHolder</var>.`[[id]]`.</p></li>
        <li><p>Initialize <var>track</var>.{{MediaStreamTrack/kind}} to <var>dataHolder</var>.`[[kind]]`.</p></li>
        <li><p>Initialize <var>track</var>.{{MediaStreamTrack/label}} to <var>dataHolder</var>.`[[label]]`.</p></li>
        <li><p>Initialize <var>track</var>.{{MediaStreamTrack/readyState}} to <var>dataHolder</var>.`[[readyState]]`.</p></li>
        <li><p>Initialize <var>track</var>.{{MediaStreamTrack/enabled}} to <var>dataHolder</var>.`[[enabled]]`.</p></li>
        <li><p>Initialize <var>track</var>.{{MediaStreamTrack/muted}} to <var>dataHolder</var>.`[[muted]]`.</p></li>
        <li><p>[=MediaStreamTrack/Initialize the underlying source=] of <var>track</var> to <var>dataHolder</var>.`[[source]]` with
          [=MediaStreamTrack/Initialize the underlying source/tieSourceToContext=] equal to <code>false</code>.</p></li>
        <li><p>Set <var>track</var>'s constraints to <var>dataHolder</var>.`[[constraints]]`.</p></li>
      </ol>
    </div>
    <div>
      <p>The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive.
      In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.</p>
    </div>
  </section>
</body>
</html>