Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble encoding a 16k (15360x7680) video #3

Open
mrlando opened this issue Aug 16, 2024 · 21 comments
Open

Trouble encoding a 16k (15360x7680) video #3

mrlando opened this issue Aug 16, 2024 · 21 comments

Comments

@mrlando
Copy link

mrlando commented Aug 16, 2024

8k videos are fine and fast (9 seconds for a 2 sec file) but I'm having trouble encoding a 16k ProRes file to spatial. I did some AI upscaling to get to 16k to see if it looks better.
It is very slow (like 23 minutes for a 2 sec 16k file). Often it errors too with: "Cannot finish writing"

There is an app that does the encode faster (like 25 sec for 2 sec file) so I know it is possible. It is called MVConverter.
https://mvconverter.app/

If you want I can send you a 16k sample to test.

I'm using the 1.2 version of the GUI app and version 0.6.2 of the spatial terminal app.

error
@mrlando
Copy link
Author

mrlando commented Aug 16, 2024

Oh and when you doubleclick on the top op the window, the buttons dissapear.

@AndrewHazelden
Copy link
Member

Could you do a quick performance comparison:

If you run Mike Swanson's Spatial CLI tool directly in a macOS terminal session, do you also get the same slow video encoding performance?

If not, then I will have to look at the CLI flags I am using and revise them. 🙂

@mrlando
Copy link
Author

mrlando commented Aug 17, 2024

I tried the CLI tool and the behaviour looks the same. Very slow progress, it seems to slow down further the more frames it does.

terminal output

@AndrewHazelden
Copy link
Member

My understanding is the Spatial CLI is simply calling the macOS native built-in AV Foundation library from Apple.

There is a chance the other spatial video converter you are exploring is using a custom encoder toolchain stacked ontop of FFmpeg or LibAV. This is very likely, if there is significant differences in performance. On the downside, if this is the case, their might be legal issues on Fraunhofer patent license fees coverage for the MV-HEVC patent license if FFmpeg is used since the open-source project offloads that responsibility to the downstream user. Which is you.

@mrlando
Copy link
Author

mrlando commented Aug 17, 2024

Mmmh, ok so you are saying that the issue lies in the AV Foundation library from Apple, so they would need to update that to fix this?

@AndrewHazelden
Copy link
Member

AndrewHazelden commented Aug 17, 2024

There might be some subtle aspects that can be refined by the API user that relates to if the Apple Silicon CPU vs XPU (Hybrid CPU + GPU) encoding is fully tuned. That is not something I have explored since I only made a wrapper GUI for my part of spatial video encoding R&D project.

@mrlando
Copy link
Author

mrlando commented Aug 17, 2024

Perhaps the resolution is too big to be handled by the GPU and it switches to CPU?

@AndrewHazelden
Copy link
Member

The macOS "Activity Monitor" program has a GPU usage history view. That could clarify how much CPU vs GPU load happens at different video frame sizes.

@mrlando
Copy link
Author

mrlando commented Aug 17, 2024

Yes there is a significant difference:

8k:
Screenshot 2024-08-17 at 12 41 42

16k:
16k high

But most of the time this:
16k low

@hughred22
Copy link

Wow, 16K - what camera are you using?!

@mikeswanson
Copy link

@mrlando ...this sounds more like an issue with the command-line tool. Feel free to drop me a note. If you can share a sample clip, that would be useful. 16K x ?K frames are tough to encode, so it doesn't surprise me that they're causing issues.

@mrlando
Copy link
Author

mrlando commented Aug 17, 2024

Wow, 16K - what camera are you using?!

I’m from the future and shot this on my iPhone 18 Pro Ultra Immersive! 😎

No just kidding! I shot on a rented Canon R5 with the dual fisheye lens at 8k and did an AI upscale.

Excellent meeting you here Hugh! Have watched a lot of your great videos!

@mrlando
Copy link
Author

mrlando commented Aug 19, 2024

Hi Mike, I’ve sent you the sample clip yesterday!

Thanks for making your excellent tool!

@mrlando
Copy link
Author

mrlando commented Aug 22, 2024

I've did some testing and it seems 15168x7568 is the biggest it will go without choking the spatial 0.6.2 terminal app. That is not too far off 15360x7680.

I did notice the GUI tool displays the bitrate to be 0 bps. If I use the terminal app directly it correctly shows the bitrate (see screenshots).

GUI:
bitrate gui

Terminal:
bitrate terminal

@mrlando
Copy link
Author

mrlando commented Aug 22, 2024

And another thing:

The GUI tool often seems to fail to remove the temp file (at least with the bigger frame sizes. The terminal tool doesn't create a temp file.

@starshippr
Copy link

starshippr commented Sep 13, 2024

i experienced the same with 16384x8192 29,97Fps

have you found a solution instead of using 15168x7568 sic! (i think you mean 15168x7584)?

btw thx @hughred22 for the sample footage ;-)

greets phil

2

@mikeswanson
Copy link

On M1 silicon, there seems to be an upper resolution limit very near 16384x8192 for 10-bit media encoding. Unfortunately, no error is thrown, so it isn't easily caught. If you pass 8-bit media, it will succeed. Most (all?) of the other tools encode to 8-bit behind the scenes, so they complete. I have filed an issue with Apple about this behavior. FWIW, M2+ based systems don't appear to have this limitation.

@starshippr
Copy link

@mikeswanson thanks for clarification! We tried on M2 max with 32gb ram and the same happens for 16384x8192 but 15168x7584 works fine. We will try with the "official" 16k resolution 15360x7680 again and see what happens.

My m1 16gb had a hard time with longer 12k clips also. The only solution was to make the clip shorter. (Few sec)

@mikeswanson btw is there any documentation about what happens audio during encoding or what types are supported?

Thanks
Phil

@starshippr
Copy link

@mikeswanson what i noted also is that it is pretty hard to see an difference between the original 8k footage and the upscaled 16k version on the avp. On my monitor i can definitely see a clear difference. Is there some kind of limitation going on while displaying it on avp?

It's really impressive to see what apple immersive videos archived with their "magical" 4320x4320 50mbps per eye projection ;-)
But i guess the camera plays a critical role here too!

Phil

@mikeswanson
Copy link

@mikeswanson thanks for clarification! We tried on M2 max with 32gb ram and the same happens for 16384x8192 but 15168x7584 works fine. We will try with the "official" 16k resolution 15360x7680 again and see what happens.

Also make sure you're using v0.6.2. It works better with large frames on all devices (except for the M1 issue).

@mikeswanson btw is there any documentation about what happens audio during encoding or what types are supported?

Other than some mentions in the spatial documentation, no. Audio is always copied...never re-encoded. The formats that are supported for playback are not defined anywhere, so I'd encourage you to experiment. Most of the common formats seemed to work in my testing, though.

@mikeswanson
Copy link

@mikeswanson what i noted also is that it is pretty hard to see an difference between the original 8k footage and the upscaled 16k version on the avp. On my monitor i can definitely see a clear difference. Is there some kind of limitation going on while displaying it on avp?

It's really impressive to see what apple immersive videos archived with their "magical" 4320x4320 50mbps per eye projection ;-) But i guess the camera plays a critical role here too!

There is a lot that goes into making a good looking image. Certainly too much to easily cover in a short post. But, once you've accounted for the distortions in the encoding format, you eventually saturate the displays in the device, and anything over that is essentially wasted effort. Many back-of-the-napkin calculations put that number at less than 8Kx8K per eye (which is a more precise way of talking about resolution) for 180-degree content. You can back-in to some of those calculations by thinking about pixels-per-degree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants