-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add background segmentation mask #142
base: main
Are you sure you want to change the base?
Conversation
Thanks @eehakkin In many cases, it might be important to have access to the original camera feed, so BG MASK retains the original frames intact, does segmentation and provides mask frames in addition to the original video frames thus web applications receive both the original frames and mask frames in the same video frame stream This PR follows up our presentation of BG Segmentation MASK in the monthly WebRTC WG call [Minutes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general thrust of this effort is very useful for Web applications.
index.html
Outdated
<p>A background segmentation mask with | ||
white denoting certainly foreground, | ||
black denoting certainly background and | ||
grey denoting uncertainty.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really only "uncertainty" that's represented? Is it perhaps sometimes partial transparency, and sometimes ambiguity?
Could anything be said here to clarify that shades of grey tend more towards the foreground/background based on being lighter/darker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index.html
Outdated
<h3>VideoFrame interface extensions</h3> | ||
<pre class="idl"> | ||
partial interface VideoFrame { | ||
readonly attribute VideoFrame? backgroundSegmentationMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this isn't going to suffer infinite recursion because the second layer deep will be guaranteed nullable. But it still strikes me as a bit odd to expose a full VideoFrame
here, with all its present and future fields, when what we really wish to get is a matrix of integer values of a limited range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, recursion is definitely not wanted.
While I by no mean insist on VideoFrame
, I think that it is benefial, if the background segmentation mask can be directly passed, for instance, to Canvas.drawImage()
or such.
Additionally, because usages of background segmentation masks are manifold (it could be post-processed remotely, locally on CPU or on GPU, etc.) and sources and pre-processing could vary (maybe the source is a boolean matrix or an integer matrix or a GPU texture), it would be good IMHO if the API didn't enforce a particular storage or representation. A VideoFrame
is good in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the attribute readonly? If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member? How is that passed to the video frame constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note VideoFrame is defined by the Media WG, so I think this needs to be discussed there.
Unless we make backgroundSegmentationMask
metadata? Either way, we should involve the Media WG here based on w3c/webcodecs#607 (comment).
If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member?
These are good questions I suspect the Media WG can answer. They made VideoFrame and its metadata immutable and define its interaction model.
Like @eladalon1983 I find it odd to expose a full VideoFrame
for a mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the attribute readonly? If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member? How is that passed to the video frame constructor?
I made backgroundSegmentationMask
to be metadata. That resolves the issue, I think.
}; | ||
|
||
partial dictionary MediaTrackConstraintSet { | ||
ConstrainBoolean backgroundSegmentationMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it ever be interesting and feasible to tweak the parameters by which segmentation is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atleast on Windows, the platform model does not allow tweaking segmentation parameters today. Using tensorflow.js with BodyPix
model for Blur, I see there's atleast a segmentationThreshold
parameter. Maybe it's the same as foregroundThresholdProbability
with the MediaPipeSelfieSegmentation
model ?
Did you have some other parameters in mind ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have some other parameters in mind?
I am not knowledgeable enough on what parameters would be best to include. I was mostly wondering if this is something we foresee extending from a boolean to a set of parameters, and if so, whether there was a viable path for such future extensions given the current API shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Media Capture API, the parameter space is flat and not hierarchical.
As an example, there is a constrainable property called whiteBalanceMode
which can be constrained to manual
. If one then wants to manually change the white balance, there is a constrainable property called colorTemperature
which can be constrained separately in order to do that.
So if we later would like to add a numeric constrainable property called backgroundSegmentationThreshold
(which could change the segmentation mask to be pre-processed to an blank and white mask according to the threshold without shades of grey) or a string constrainable property called backgroundSegmentationModel
(to use the particular AI model), we could certainly do that.
By the way, having spoken to some people who work on camera effects in video-conferencing applications, I have some more feedback. (Not sure if this has been discussed in the past.) Video conferencing applications often have to be very careful about what models they use, two interesting reasons being:
I am getting the feeling that, if we want serious Web apps to use this valuable work, it might be necessary to also expose something about the underlying model. I am not sure what the MVP is in that regard; possibly even just some stable identifier that apps can use against an allowlist of models/implementation that they had vetted and found sufficient? |
Good feedback. The way we plan to implementing this API today on Chrome/Edge is using the platform models which are presently shipping by default on the underlying OS. If you are making a native app today without bringing your own models, likely you will use what the platform provides. I think when users bring their own models, this is a serious issue to consider. Also when major platforms are shipping efficient on-device models by default in the OS, does it make sense for every app to bring their own Segmentation models ? Differentiation vs Efficiency trade-offs. Consistency: |
I think we should spin off the discussion about identifying the model (or some of its properties) out of this PR and into an issue. Just some quick clarifications, though.
|
@aboba : Is it possible to share more information of how Microsoft does due diligence before putting in the OS ?
Very true. I am expecting platform vendors to update models (maybe via drivers or OS updates) as hardware becomes more capable.
|
Even if a video conferencing app runs on {UA, UA-version, OS, OS-version}, it might still not know definitively which model is used, as that might be subject to experiments, out-of-band updates, etc. Apps might require more information exposed to them about the segmentation model before they can use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the approach has changed to an extra video frame argument. I think this is a better approach. But still having questions.
index.html
Outdated
<h3>VideoFrame interface extensions</h3> | ||
<pre class="idl"> | ||
partial interface VideoFrame { | ||
readonly attribute VideoFrame? backgroundSegmentationMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the attribute readonly? If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member? How is that passed to the video frame constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @riju, does this PR resolve an open issue? If not, can you open one?
I see this PR modifies VideoFrame which is defined by the Media WG, so I think those parts need to be discussed there.
I think this could use their expertise.
index.html
Outdated
<h3>VideoFrame interface extensions</h3> | ||
<pre class="idl"> | ||
partial interface VideoFrame { | ||
readonly attribute VideoFrame? backgroundSegmentationMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note VideoFrame is defined by the Media WG, so I think this needs to be discussed there.
Unless we make backgroundSegmentationMask
metadata? Either way, we should involve the Media WG here based on w3c/webcodecs#607 (comment).
If JS wishes to modify the background segmentation mask of a frame, how can you do it? Create a new video frame with a new segmentation mask member?
These are good questions I suspect the Media WG can answer. They made VideoFrame and its metadata immutable and define its interaction model.
Like @eladalon1983 I find it odd to expose a full VideoFrame
for a mask.
This API was discussed in https://www.w3.org/2024/04/23-webrtc-minutes.html#t08 |
I replaced I should later add also an example, I think. |
Note that this still requires registering the metadata like was done in w3c/webcodecs#607. |
…d settings Some platforms or User Agents may provide built-in support for background segmentation mask, in particular for camera video streams. Web applications may want to control the generation of the background segmentation mask and to leverage on it for background segmentation while still having access to the original unmodified video frames. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: Iaa8d4a7bfd9d53eb70ca7055bd18ba6125245a55 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5665731 Reviewed-by: Rijubrata Bhaumik <[email protected]> Reviewed-by: Mike West <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1339674}
This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8
This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202}
This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202}
This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202}
… segmentation mask constraints, a=testonly Automatic update from web-platform-tests [MediaCapture Extensions] Fix background segmentation mask constraints This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202} -- wpt-commits: 732899be1aa3e1efcb01cab15852eb4c05bfe9b5 wpt-pr: 47585
… segmentation mask constraints, a=testonly Automatic update from web-platform-tests [MediaCapture Extensions] Fix background segmentation mask constraints This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202} -- wpt-commits: 732899be1aa3e1efcb01cab15852eb4c05bfe9b5 wpt-pr: 47585
This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202}
… segmentation mask constraints, a=testonly Automatic update from web-platform-tests [MediaCapture Extensions] Fix background segmentation mask constraints This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <[email protected]> Commit-Queue: Eero Hakkinen <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Cr-Commit-Position: refs/heads/main@{#1341202} -- wpt-commits: 732899be1aa3e1efcb01cab15852eb4c05bfe9b5 wpt-pr: 47585
… segmentation mask constraints, a=testonly Automatic update from web-platform-tests [MediaCapture Extensions] Fix background segmentation mask constraints This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <rijubrata.bhaumikintel.com> Commit-Queue: Eero Hakkinen <eero.hakkinenintel.com> Reviewed-by: Guido Urdaneta <guidouchromium.org> Cr-Commit-Position: refs/heads/main{#1341202} -- wpt-commits: 732899be1aa3e1efcb01cab15852eb4c05bfe9b5 wpt-pr: 47585 UltraBlame original commit: 2c630b9a9b00e52c53c4a4871af68a122133d332
… segmentation mask constraints, a=testonly Automatic update from web-platform-tests [MediaCapture Extensions] Fix background segmentation mask constraints This CL fixes background segmentation mask constraints to be passed to the imagecapture module and adds a web platform test to test background segmentation mask constraints and settings. Background segmentation mask feature is behind a flag: chrome --enable-blink-features=MediaCaptureCameraControls Intent to Prototype: https://groups.google.com/a/chromium.org/g/blink-dev/c/nWEqxi83rus Spec: w3c/mediacapture-extensions#142 Explainer: https://github.com/riju/backgroundBlur/blob/main/explainer.md#background-segmentation-mask-api Bug: 349939554 Change-Id: I1c11bd8919272147ed28f699a38dd8922cefc4c8 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5783519 Reviewed-by: Rijubrata Bhaumik <rijubrata.bhaumikintel.com> Commit-Queue: Eero Hakkinen <eero.hakkinenintel.com> Reviewed-by: Guido Urdaneta <guidouchromium.org> Cr-Commit-Position: refs/heads/main{#1341202} -- wpt-commits: 732899be1aa3e1efcb01cab15852eb4c05bfe9b5 wpt-pr: 47585 UltraBlame original commit: 2c630b9a9b00e52c53c4a4871af68a122133d332
Hi!
This adds capabilities, constraints and settings for background segmentation mask. Those are fairly obvious.
For the feature to be useful, the actual background segmentation mask must be provided to web apps. There are various ways to do that:
However, that makes it awkward to process such streams and very unclear how to encode them.
/cc @riju
Preview | Diff