Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: video: video_stm32_dcmi: Use video buffers for DCMI buffer. #84446

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iabdalkader
Copy link
Contributor

@iabdalkader iabdalkader commented Jan 23, 2025

Instead of reserving a static (possibly unaligned) buffer for DCMI, this patch reserves and holds one of the video buffers to use as the main DCMI buffer. This buffer will be aligned (using the alignment specified in the config) and will either be allocated from video_common pool or a shared multi-heap (if enabled).

Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(context for other readers =>) It seems like the "continuous mode" of DCMI uses a different model where processing happens in the leap between two frames, rather than use a different buffer every time. That could be improved in the future maybe, but until then, allocating a buffer locally and memcpy()-ing it on every vbuf->data coming from the API seems like the way to go.

See inline comment for a proposal for how to integrate it... Thank you!

Comment on lines 25 to 27
#if defined(CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP)
#include <zephyr/multi_heap/shared_multi_heap.h>
#define VIDEO_STM32_HEAP_ALLOC(align, size, timeout) \
shared_multi_heap_aligned_alloc(CONFIG_VIDEO_BUFFER_SMH_ATTRIBUTE, align, size)
#define VIDEO_STM32_HEAP_FREE(block) shared_multi_heap_free(block)
#else
K_HEAP_DEFINE(video_stm32_buffer_pool, CONFIG_VIDEO_BUFFER_POOL_SZ_MAX);
#define VIDEO_STM32_HEAP_ALLOC(align, size, timeout) \
k_heap_aligned_alloc(&video_stm32_buffer_pool, align, size, timeout);
#define VIDEO_STM32_HEAP_FREE(block) k_heap_free(&video_stm32_buffer_pool, block)
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this snippet is the same as here save for the symbol name:

#define VIDEO_COMMON_HEAP_ALLOC(align, size, timeout) \
shared_multi_heap_aligned_alloc(CONFIG_VIDEO_BUFFER_SMH_ATTRIBUTE, align, size)
#define VIDEO_COMMON_FREE(block) shared_multi_heap_free(block)
#else
K_HEAP_DEFINE(video_buffer_pool, CONFIG_VIDEO_BUFFER_POOL_SZ_MAX*CONFIG_VIDEO_BUFFER_POOL_NUM_MAX);
#define VIDEO_COMMON_HEAP_ALLOC(align, size, timeout) \
k_heap_aligned_alloc(&video_buffer_pool, align, size, timeout);
#define VIDEO_COMMON_FREE(block) k_heap_free(&video_buffer_pool, block)
#endif

How about moving the video_common.c macros (except K_HEAP_DEFINE()) to video_common.h, and adding a buffer_pool argument to specify which buffer pool is targeted?

That way, both video_common.c and video_some_driver.c can use the same implementation.

Suggested change
#if defined(CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP)
#include <zephyr/multi_heap/shared_multi_heap.h>
#define VIDEO_STM32_HEAP_ALLOC(align, size, timeout) \
shared_multi_heap_aligned_alloc(CONFIG_VIDEO_BUFFER_SMH_ATTRIBUTE, align, size)
#define VIDEO_STM32_HEAP_FREE(block) shared_multi_heap_free(block)
#else
K_HEAP_DEFINE(video_stm32_buffer_pool, CONFIG_VIDEO_BUFFER_POOL_SZ_MAX);
#define VIDEO_STM32_HEAP_ALLOC(align, size, timeout) \
k_heap_aligned_alloc(&video_stm32_buffer_pool, align, size, timeout);
#define VIDEO_STM32_HEAP_FREE(block) k_heap_free(&video_stm32_buffer_pool, block)
#endif
#if defined(CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP)
#include <zephyr/multi_heap/shared_multi_heap.h>
#define VIDEO_STM32_HEAP_ALLOC(buffer_pool, align, size, timeout) \
shared_multi_heap_aligned_alloc(CONFIG_VIDEO_BUFFER_SMH_ATTRIBUTE, align, size)
#define VIDEO_STM32_HEAP_FREE(buffer_pool, block) shared_multi_heap_free(block)
#else
K_HEAP_DEFINE(video_stm32_buffer_pool, CONFIG_VIDEO_BUFFER_POOL_SZ_MAX);
#define VIDEO_STM32_HEAP_ALLOC(buffer_pool, align, size, timeout) \
k_heap_aligned_alloc(buffer_pool, align, size, timeout);
#define VIDEO_STM32_HEAP_FREE(buffer_pool, block) k_heap_free(buffer_pool, block)
#endif

I am asking for your feedback on this, and if this does not make sense, the proposed solution also makes sense to me!

Copy link
Contributor Author

@iabdalkader iabdalkader Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving the video_common.c macros (except K_HEAP_DEFINE()) to video_common.h,

Either way is fine with me. There's no video_common.h though, new file? Or do you mean video.h?

Also we'll need something like this in drivers if we add the pool arg:

#if defined(CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP)
#define VIDEO_BUFFER_POOL   (0)
#else
K_HEAP_DEFINE(VIDEO_BUFFER_POOL, CONFIG_VIDEO_BUFFER_POOL_SZ_MAX*CONFIG_VIDEO_BUFFER_POOL_NUM_MAX);
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no video_common.h though, new file? Or do you mean video.h?

Ah you are right. This would be a new file, or if you believe you could ever want to call that from the application, this can justify to put it in the public API in video.h, in which case, it becomes part of the public video API rather than an implementation detail (needs deprecation warning if API is changed, ideally unit tests too, etc).

Also we'll need something like this in drivers if we add the pool arg:

That is a good idea! That would allow to skip the pool arg altogether if there is a local macro that defines the VIDEO_BUFFER_POOL to use locally. More convenient that way!

An example of "xxx_common.h" file for USB:
https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/usb/udc/udc_common.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. How about now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the effort of refactoring this in generic APIs!

@iabdalkader iabdalkader force-pushed the stm32_video_smh branch 4 times, most recently from 8dec094 to 2452a5f Compare January 23, 2025 16:19
josuah
josuah previously approved these changes Jan 23, 2025
Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that other collaborators will have a better outlook of how to handle this.

At my level, it looks like the best way to implement it without completely reorganizing the video APIs (i.e. switch to net_buf or rtio for buffer management).

Thank you very much!

josuah
josuah previously approved these changes Jan 23, 2025
Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I missed it.

@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

Just a first look, perhaps I missed some thing but I don't understand why you need to expose the VIDEO_HEAP_ALLOC macro (old name VIDEO_COMMON_HEAP_ALLOC) to use it in a driver ? Why don't just use the video_buffer_alloc() API ?

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

Why don't just use the video_buffer_alloc() API ?

Because as it stands, VIDEO_COMMON_HEAP_ALLOC uses the pool defined in video_common.c, which is the main video buffers FIFO. video_stm32 allocates another separate buffer, and copies it to this FIFO. If you have only one buffer in that FIFO CONFIG_VIDEO_BUFFER_POOL_SZ_MAX=1 (very common case), and let video_stm32 reserve it, it will never be able to dequeue another buffer to copy to.

One way to fix this is to require a minimum of two buffers for this driver, but I don't know how to do that, or if it's even possible, is it?

@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

Ok, I understand the context now.

    1. The DCMI driver needs an additional buffer for its personal usage (a part from the main video buffer pools used for the whole camera pipeline) => for this, I think some aspects need to be investiagted too, e.g does memcpy() used here is optimal, etc. But well, this is not in the scope of this PR.
    1. This PR adds an option to allocate this additional buffer (required soly by the DCMI) either on the shared multi-heap (if CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP enabled) or on the dcmi driver heap.

So, I think you just need to do it in the DCMI driver. Refactoring the video_common to expose the macro VIDEO_HEAP_ALLOC is not the right way to do here because

  • The VIDEO_HEAP_ALLOC and video_buffer_alloc() is to allocate buffers for the main pool, so its heap should be always video_buffer_pool heap. The additional buffer needed by the DCMI is something related only to the DCMI not the main video buffer pool.
  • Moreover, the refactoring does not help to save number of code lines.

Even if we want a "common" macro to allocate buffers on any heap, this should be defined system-wide, not inside the video subsystem (but again, it does not save much code lines)

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

So, I think you just need to do it in the DCMI driver. Refactoring the video_common to expose the macro VIDEO_HEAP_ALLOC is not the right way to do here because

You mean duplicate the macro? The initial PR did that exactly, but then I was asked to move it to common header. Are we all in agreement this time that it needs to be duplicated?

Moreover, the refactoring does not help to save number of code lines.

It does, because without it, we'll have to duplicate the macro more or less.

Alternatively, requiring a minimum number of buffers (in the case of stm32 >= 2) for stm32 is another valid option, and then we can use video_buffer_aligned_alloc. The only drawback I see is that using CONFIG_VIDEO_BUFFER_POOL_NUM_MAX from the application might be confusing as it will be -1 for the DCMI.

@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

You mean duplicate the macro? The initial PR did that exactly, but then I was asked to move it to common header. Are we all in agreement this time that it needs to be duplicated?

Sorry for that, cc @josuah (?). It is used only in one place so you maybe don't need to define the macro, just

#if defined(CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP)
shared_multi_heap_aligned_alloc();
#else
k_heap_alloc();

? but if it needs to be duplicated, we do it. There are code duplication existing in Zephyr, especially when you look at the .conf, overlay.

As explained above, VIDEO_HEAP_ALLOC or video_buffer_alloc() is to allocate buffer on the video buffer pool heap. If we want to generalize it for allocation on any particular heap, it should be an utility macro defined system wide so that other non-video drivers can use it as well.

There are examples such as a display driver needs a specific display heap for buffer allocation as here so it does not mean that display driver should use the refactored VIDEO_HEAP_ALLOC macro (?).

Alternatively, requiring a minimum number of buffers (in the case of stm32 >= 2) for stm32 is another valid option, and then we can use video_buffer_aligned_alloc. The only drawback I see is that using CONFIG_VIDEO_BUFFER_POOL_NUM_MAX from the application might be confusing as it will be -1 for the DCMI.

In general, it could be an option. The DCMI can use the common video buffer pool for its personal purpose as well. The CONFIG_VIDEO_BUFFER_POOL_NUM_MAX default value is already 2. But the problem is this config can be changed accidentally / easily by the application and the application does not know that the DCMI driver needs an additional buffer.

A similar option which was discussed here is to use the system heap instead of the driver heap and extend it in the driver with CONFIG_HEAP_MEM_POOL_ADD_SIZE_XXX but if I remember well it does not add the additiobnal required size but specifies the whole size ?? (the discussion was a long time ago, I need to recall it).

@iabdalkader
Copy link
Contributor Author

so it does not mean that display driver should use the refactored VIDEO_HEAP_ALLOC macro (?).

Actually it should, because it reserves a chunk of memory that it may not use at all.

But the problem is this config can be changed accidentally / easily by the application and the application does not know that the DCMI driver needs an additional buffer.

I thought about this and have a better idea. What if video_stm32_dcmi.c dequeued a video buffer instead of allocating raw memory? The application allocates video buffers with video_buffer_aligned_alloc and enqueues them with video_enqueue. Then when video_stm32_dcmi_stream_start is called it will dequeue one video buffer and hold it until streaming is stopped. Note that the driver will fail to build if the MAX buffers is less than 2 for stm32 video.

Instead of reserving a static (possibly unaligned) buffer for DCMI,
this patch reserves and holds one of the video buffers to use as the
main DCMI buffer. This buffer will be aligned (using the alignment
specified in the config) and will either be allocated from `video_common`
pool or a shared multi-heap (if enabled).

Signed-off-by: Ibrahim Abdalkader <[email protected]>
@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

so it does not mean that display driver should use the refactored VIDEO_HEAP_ALLOC macro (?).

Actually it should, because it reserves a chunk of memory that it may not use at all.

I think you misunderstood my point here. I mean the display driver could use a common macro like HEAP_ALLOC() defined system wide (to save some line of codes) but it should not include video_common.h and use VIDEO_HEAP_ALLOC() because VIDEO_HEAP_ALLOC() is for video subsystem.

To be clearer, take a look at the signature of the newly refactored VIDEO_HEAP_ALLOC:

#define VIDEO_HEAP_ALLOC(buffer_pool, align, size, timeout)

there is nothing specific to video anymore, so that's why it should be HEAP_ALLOC(buffer_pool, align, size, timeout) defined system wide so that any video / display / others drivers could use it.

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

there is nothing specific to video anymore, so that's why it should be HEAP_ALLOC(buffer_pool, align, size, timeout) defined system wide so that any video / display / others drivers could use it.

I see. Well I just need to solve my immediate problem right now: Use SMH if enabled. I imagine adding a general purpose macro/public API will require reviews docs, tests etc..

Please see the updated commit, if it's not good for any reason I'll duplicate the macro in video_stm32_dcmi.c.

@ngphibang
Copy link
Contributor

Thanks. I need some time to understand the context / purpose of the additional buffer in the DCMI driver (I missed the review of this driver). Otherwise, @CharlesDias may have a better look.

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

I need some time to understand the context / purpose of the additional buffer in the DCMI driver

No worries. FWIW, there's no reason not to use the FIFO to pass buffers between driver application without memcpy (i.e., put back the buffer after capture, and get another one). However, if the user/application does not dequeue, process and enqueue fast enough the FIFO on DCMI side will underrun and will have to stop streaming, and possibly restart later. This, I believe, is the only reason for the additional buffer. Is it fixable? Probably yes, but again just trying to fix my immediate problem :)

@josuah
Copy link
Collaborator

josuah commented Jan 24, 2025

Thanks all for this in-depth study!

One way to fix this is to require a minimum of two buffers for this driver, but I don't know how to do that, or if it's even possible, is it?

It is possible to declare it, although, not taken advantage of through the samples or API:

uint8_t min_vbuf_count;

maybe don't need to define the macro, just #if [...] shared_multi_heap_aligned_alloc(); #else k_heap_alloc();

That sounds good, it would be a more lightweight intermediate solution for a more general fix coming later.

What if video_stm32_dcmi.c dequeued a video buffer instead of allocating raw memory? The application allocates video buffers with video_buffer_aligned_alloc and enqueues them with video_enqueue.

That is how most video drivers work it seems, and if it is possible to make DCMI work that way, that avoids the problem of local video memory allocation in the driver.

context / purpose of the additional buffer in the DCMI driver

The way the buffers are currently loaded/unloaded:

HAL_DCMI_Suspend(hdcmi);
vbuf = k_fifo_get(&dev_data->fifo_in, K_NO_WAIT);
if (vbuf == NULL) {
LOG_DBG("Failed to get buffer from fifo");
goto resume;
}
vbuf->timestamp = k_uptime_get_32();
memcpy(vbuf->buffer, dev_data->buffer, vbuf->bytesused);
k_fifo_put(&dev_data->fifo_out, vbuf);
resume:
HAL_DCMI_Resume(hdcmi);

I do not see any API call that allows to update that pData to the buffer used for I/O:

https://github.com/zephyrproject-rtos/hal_stm32/blob/1d1f81866ccbaa6e84e9960ed763e005d1e45560/stm32cube/stm32f7xx/drivers/include/stm32f7xx_hal_dcmi.h#L559

HAL_StatusTypeDef HAL_DCMI_Start_DMA(DCMI_HandleTypeDef *hdcmi, uint32_t DCMI_Mode, uint32_t pData, uint32_t Length);

Maybe the hardware does not allow to update the buffer without stopping/starting the engine?

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

That is how most video drivers work it seems, and if it is possible to make DCMI work that way, that avoids the problem of local video memory allocation in the driver.

I've already implemented this and updated the PR, no local video buffer allocation but still uses memcpy as before, this has not changed.

Maybe the hardware does not allow to update the buffer without stopping/starting the engine?

The DMA does allow updating the target address when not in use, in general or just in double buffer mode can't remember, but either way this is a different issue, which I'm not trying to fix. However, this PR is a step in the right direction: you need to use video buffers before you can remove the memcpy anyway.

Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the update. Very good first step indeed!
I prefer it over what I originally suggested (previous LGTM) as I did not think of this idea.
+1 for my part!

Comment on lines +250 to +253
data->vbuf = k_fifo_get(&data->fifo_in, K_NO_WAIT);

if (data->vbuf == NULL) {
LOG_ERR("Failed to dequeue a DCMI buffer.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this commit, if a video buffer is not passed before start(), this would return an error.
I think that all examples in main would still work:

here: 2 buffers
https://github.com/zephyrproject-rtos/zephyr/blob/main/samples/drivers/video/tcpserversink/src/main.c#L41
https://github.com/zephyrproject-rtos/zephyr/blob/main/samples/drivers/video/capture_to_lvgl/src/main.c#L22

or otherwise, CONFIG_VIDEO_BUFFER_POOL_NUM_MAX
https://github.com/zephyrproject-rtos/zephyr/blob/main/samples/drivers/video/capture/src/main.c#L85

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the applications should still work, mine still works without changes:

App: allocates 2 buffers.
App: enqueues 2 buffers in fifo_in.
App: starts streaming.
Driver: dequeues buffer 1 from fifo_in and holds it for use as DCMI buffer.
Driver: Frame ready, dequeue buffer 2 from fifo_in, memcpy, enqueue in fifo_out.

@CharlesDias
Copy link
Contributor

Hi, @iabdalkader. Thank you for your contribution! This is a great improvement! :)

@ngphibang and @josuah, is there another way to release the video buffers besides using the video_buffer_release? I'm wondering if there's a possibility for the video buffer to be released while the DCMI driver is running. I believe this isn't possible under normal usage conditions.

Additionally, I tested the capture_to_lvgl sample on MiniSTM32H743 and still worked.
It has my +1!

@iabdalkader iabdalkader changed the title drivers: video: video_stm32_dcmi: Use shared multi-heap if enabled. drivers: video: video_stm32_dcmi: Use video buffers for DCMI buffer. Jan 25, 2025
@josuah
Copy link
Collaborator

josuah commented Jan 25, 2025

Thank you for testing it!

is there another way to release the video buffers besides using the video_buffer_release?

If the application decides to use a vbuf after it was enqueued(), and before it was dequeued(), AFAIU, it is an undefined behavior (as the buffer is expected to be processed by the driver). It is expected to forget a vbuf pointer as soon as it is passed to enqueue().

AFAICT, the current APIs give the driver the freedom of how many buffers it holds and in which order they are released.

void video_buffer_release(struct video_buffer *vbuf)

The video buffer variables are not exported, there is no way to access it from the outside other than the provided functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Video Video subsystem platform: STM32 ST Micro STM32
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants