Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update CEF to 117 #1499

Merged
merged 11 commits into from
Nov 27, 2023
Merged

feat: Update CEF to 117 #1499

merged 11 commits into from
Nov 27, 2023

Conversation

Julusian
Copy link
Member

@Julusian Julusian commented Oct 6, 2023

This work was sponsored by NRK.

Related #1235
Closes #1498
Fixes #1265
Fixes #1454
Fixes #1387
Fixes #1441
And others?

It has been a few years since I last evaluated whether the shared-texture flow in CEF was necessary. I started off looking into this as I was curious how linux performed, which has never had shared-texture support.

Initial tests were run on a machine with a 7950x (dual channel ddr5) and GTX1060, using https://www.testufo.com/ as the page to produce constant fluid motion. I did try some webgl demos at a couple of points, but results were not impacted noticably. Using a single channel of 1080i5000. From memory, the cost is the same no matter how many channels the producers are spread across, but this needs verifying.

Using the CEF95 in master:

CEF95 windows CEF95 linux CEF117 windows CEF117 linux
gpu disabled 18 12 18 17
gpu enabled (with shared) 8 - - -
gpu enabled (no shared) 5 12 13 15

Also worth noting, CEF95 gpu-enabled without shared-textures flooded the console with errors from CEF.
At one point I was having issues with CEF117 crashing on linux, that appears to have stopped happening now.

From this, it looks like there is no big change in gpu-disabled behaviour (a change of 1 could be due to my testing method), but gpu-enabled (no shared) is performing much better, and looks to negate the need for shared-textures.

To verify this, I tried this on another machine (i9-10920X & Quadro P4000. quad channel DDR4)

CEF71 windows CEF95 windows CEF95 linux CEF117 windows CEF117 linux
gpu disabled 7 15 16 14 16
gpu enabled (with shared) - 8 - - -
gpu enabled (no shared) 7 2 12 9 14

And another (i7-7800X & RTX 3080Ti. dual channel DDR4)

CEF71 windows CEF95 windows CEF117 windows
gpu disabled 4 6 5
gpu enabled (with shared) 6 7 -
gpu enabled (no shared) ? ? 5

And another (xeon W-2133 & Quadro P4000. dual channel DDR4)

CEF71 windows CEF95 windows CEF117 windows
gpu disabled 3 4 4
gpu enabled (with shared) 5 6 -
gpu enabled (no shared) 0 ? 3

While not a perfect match, depending on the machine used performance can be better than before.

Investigating this, I discovered that the slowness with gpu enabled (no shared) was due to the memcpy we are performing, to copy the CEF pixel buffer into our gpu backed buffer. It was notably slower in this vs gpu disabled, but wrapping the copy with a tbb:parallel_for improves performance such that it rivals the gpu disabled flow. Making the same change when gpu disabled or on linux has no noticable impact, so it is not done in either of those cases.
This performance optimisations was made for only the CEF117 builds, as this PR is aiming to ensure it doesnt degrade performance compared to the current builds/releases of casparcg.

For this test I have been focusing on how the number of producers, and not focusing at all on the actual resource usage of this. I expect that there will be more cost on system memory especially for this change, but at this point this is a question of do we want to accept the increased resource requirement, or stick with a 2021 version of CEF for the forseeable future. There are known issues with the current CEF integration (#1454), causing some users to be sticking with the previous version we had, CEF71 from 2018.
The unfortunate situation is that the shared-texture support in CEF is broken, and has been for over 2 years. Various people have been fixing it for each new release of CEF, most recently OBS. But as of CEF 104 from mid 2022, the patch needs to be rewritten from scratch, which is likely to take quite a while to happen. As it looks like we do not need (but we would like it, once possible again) the shared-texture support, I think it best to drop it for now so that we can update CEF.
I have tried again at making CEF 103 work for us, starting from the work done by OBS, but getting the D3D to GL interop to not lose frames is proving too challenging. And as that version is already over a year old, how long would we stick to it for?

Windows build: https://builds.julusian.dev/casparcg/casparcg-server-cef117-upstream-2842bc616bb9817547b775eae466d330050d0025-windows.zip

@Sidonai-1
Copy link

Thank you for the research, I am very interested in trying the build once its merged.

By your post and what I could understand googling a bit: the missing "shared texture" will only affect performance in certain cases, is that correct? Or is there any HTML features/ workflows that will be directly affected or non-functional by its absence?

I am trying to understand what this change entails to remain in the old one with shared-texture support or just update with no fear of the old templates breaking.

Thanks!

@Julusian
Copy link
Member Author

Julusian commented Oct 9, 2023

When gpu-enabled is set to true in the config, that means that CEF will be doing its compositing on the GPU (on windows using D3D).

When using the shared textures flow, CEF passes us a handle to that frame (still on the GPU). We are then able to tell the GPU to adopt that frame into OpenG and make a copy of the frame. During this, the frame will stay entirely within the GPU.

Without shared textures, CEF is downloading the frame to CPU memory and passing us that CPU located pixels. We then perform a copy and send that back to the GPU.

So the difference becomes:
With shared textures: GPU -> GPU (with OBS CEF 103, that is GPU -> GPU -> GPU, and might need another step adding)
Without shared textures: GPU -> CPU -> CPU -> GPU
In both cases there could be extra copies going on that I am not aware of, but they should only be within a 'zone'

Copying via the CPU is problematic because firstly it requires more copies to be done, which costs CPU time. And going between the GPU and CPU will consume PCIE bandwidth.
Doing some quick maths, a gen 3 16lane PCIE slot (typical GPU) can do 15760MB/s. With a single 1080p frame being ~8MB, at 50p that means that a little over 39 producers will saturate the PCIE bus (assuming no other overhead). If doing 4k50, that becomes a little over 9.
More load will also be put on system memory, but as that is a lot faster than PCIE, it is less likely to be an issue and is likely to be because there are unpopulated memory channels.
If using a PCIE 4 gpu and motherboard, the bandwidth is doubled, as is the number of producers. Double it again for gen5.

A caveat here, is that if using an onboard gpu (one built into the cpu), because it shares system memory, doing the copy via cpu most likely has very little cost. I expect this environment would not be very representive of production deployments. As such I have not done any testing here

I should probably repeat the same test in 4K50, to make sure that it remains similarly usable.


As far as Im aware, nothing should break with this, other than a degradation in performance due to the change in copying. Whether that change is noticable is what my measurements in the PR description were trying to figure out.
It is entirely possible that some templates may break a little, due to the jump in chromium version, but if that happens it is unavoidable and should be unrelated to whether shared-textures is in use or not. If concerned, they will behave the same as in normal chrome/chromium

@Sidonai-1
Copy link

Thank you for the detailed explanation. It makes sense... I now understand how the lack of shared textures can easily become a headache.

@Julusian Julusian marked this pull request as ready for review November 21, 2023 16:19
@Julusian Julusian changed the title WIP: Update CEF to 117 feat: Update CEF to 117 Nov 22, 2023
@Julusian Julusian linked an issue Nov 24, 2023 that may be closed by this pull request
@Julusian Julusian merged commit f561c94 into master Nov 27, 2023
8 checks passed
@rrebuffo
Copy link

I've been testing some templates in 2.4-beta1 vs 2.4-dev and it seems that enable-gpu no longer works in beta1.
I tried playing a hefty template that uses a lot of GPU and I found that it doesn't use 3D features of the GPU at all:

  • 3D GPU usage in 2.4-beta1: 18%
  • 3D GPU usage in 2.4-dev: 67%

I captured a video to show the results:
https://www.youtube.com/watch?v=5xDlA70Fyj8
The beginning of the video shows the template working properly (2.4-dev with GPU enabled), after that is playing at ~4fps.

@Julusian
Copy link
Member Author

@rrebuffo are you running on windows or linux?
what gpu are you using?
what resolution is this running at?

CEF should still be gpu enabled, but I did tell it to use a different backend which could have an impact on this. It seemed fine in my testing though

@rrebuffo
Copy link

Sorry for the lack of context.

My developement rig is:
Intel Core i9-10900X
4 channel 64GB 3600Mhz
Nvidia 1050Ti GPU
Windows 10 22H2

I'm running a custom resolution: 3584x768

I could prepare the template for you to test. It's heavy on SVGs and masks so it's a good candidate to stress chrome.

@Julusian
Copy link
Member Author

Julusian commented Dec 4, 2023

@rrebuffo yes a copy of the template would be helpful
Either pm me on the forum, or email it to [email protected]

@Julusian Julusian deleted the feat/cef-117 branch December 21, 2023 14:23
@Julusian
Copy link
Member Author

Julusian commented Dec 21, 2023

@rrebuffo I've taken a look at this, and it seems to be a consequence of some of the performance tweaks made.
Primarily, the switch of the angle backend used by CEF is the problem here. I suspect its a case of the GL backend works better when running many simpler templates, but the default backend (d3d11?) is better for fewer more heavy templates.
So this is now exposed as a config field.

Try using this with the next beta build, and it should run better:

<html>
    <enable-gpu>true</enable-gpu>
	<angle-backend></angle-backend>
</html>

@Sidonai-1
Copy link

@rrebuffo I've taken a look at this, and it seems to be a consequence of some of the performance tweaks made. Primarily, the switch of the angle backend used by CEF is the problem here. I suspect its a case of the GL backend works better when running many simpler templates, but the default backend (d3d11?) is better for fewer more heavy templates. So this is now exposed as a config field.

Try using this with the next beta build, and it should run better:

<html>
    <enable-gpu>true</enable-gpu>
	<angle-backend></angle-backend>
</html>

I tried to test this with the 2023-12-22 15:16 build but I couldn't make it show aything on screen. The HTML shows up in the DIAG, but there is no output. I tried GPU true and false, and backend blank, gl, d9 and d11. I didn't have an SDI output at the moment, just the screen consumer.

@Julusian
Copy link
Member Author

@Sidonai-1 should be better now. I appear to have been clumsy when doing #1390 and broke most producers (all except ffmpeg).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants