Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new interface vaMapBuffer2 for map operation optimization #377

Merged
merged 1 commit into from
Oct 10, 2023

Conversation

XinfengZhang
Copy link
Contributor

add new vaMapBuffer2
backend driver need some usage hint to optimize the operation

Signed-off-by: Carl Zhang [email protected]

va/va.c Outdated Show resolved Hide resolved
va/va.h Outdated
*/
typedef enum
{
VAMapBufferFlagDefault = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a comment what default means (i.e. read|write). Also, I think we need to add a comment that these enum flags are being supported for bitwise operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, resolve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest this text:

/** \brief VAMapBufferFlagDefault is used when there are no flag specified
 * Same as VAMapBufferFlagRead | VAMapBufferFlagWrite.
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are enum values , not macro, it is why I dont use VAMapBufferFlagRead | VAMapBufferFlagWrite, or next extended flag must set value to 0x4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next extended flag must set value to 0x4?

Maybe we will just use #define VAMapBufferFlagDefault 0 and so on then?

va/va.c Outdated Show resolved Hide resolved
va/va.c Outdated
{
va_status = ctx->vtable->vaMapBuffer( ctx, buf_id, pbuf );
}
VA_TRACE_ALL(va_TraceMapBuffer, dpy, buf_id, pbuf, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VAMapBufferFlagDefault

va/va.h Outdated
*/
typedef enum
{
VAMapBufferFlagDefault = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, resolve

va/va.h Outdated
{
/** \brief VAMapBufferFlagDefault is used when there are no flag specified*/
VAMapBufferFlagDefault = 0,
/** \brief application will read the surface after map */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the flags be used on OR ? It's not clear from text description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Not clear whether this VAMapBufferFlagRead | VAMapBufferFlagWrite can be done. For example:

/** \brief Bitwise flag: application will read the surface after map */

and same below for write:

/** \brief Bitwise flag: application will write the surface after map */

va/va.h Outdated
*/
typedef enum
{
VAMapBufferFlagDefault = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest this text:

/** \brief VAMapBufferFlagDefault is used when there are no flag specified
 * Same as VAMapBufferFlagRead | VAMapBufferFlagWrite.
 */

va/va.h Outdated
{
/** \brief VAMapBufferFlagDefault is used when there are no flag specified*/
VAMapBufferFlagDefault = 0,
/** \brief application will read the surface after map */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Not clear whether this VAMapBufferFlagRead | VAMapBufferFlagWrite can be done. For example:

/** \brief Bitwise flag: application will read the surface after map */

and same below for write:

/** \brief Bitwise flag: application will write the surface after map */

@rojkov rojkov removed their request for review March 30, 2020 11:46
@IlyaSergeyev
Copy link
Contributor

@XinfengZhang What the current status? This change can significantly improve encoding performance.

/** \brief VA_MAPBUFFER_FLAG_DEFAULT is used when there are no flag specified
* same as VA_MAPBUFFER_FLAG_READ | VA_MAPBUFFER_FLAG_WRITE.
*/
#define VA_MAPBUFFER_FLAG_DEFAULT 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh , @XinfengZhang , can we do drop VA_MAPBUFFER_FLAG_DEFAULT because VA_MAPBUFFER_FLAG_DEFAULT in fact == VA_MAPBUFFER_FLAG_READ | VA_MAPBUFFER_FLAG_WRITE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having explicitly said what's the behavior in case of naive default value (=0) is good. I would leave DEFAULT declared.

va/va.c Outdated
VA_FOOL_FUNC(va_FoolMapBuffer, dpy, buf_id, pbuf);

va_status = ctx->vtable->vaMapBuffer( ctx, buf_id, pbuf );
if (ctx->vtable->vaMapBuffer2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really a good optimization because of the flags can save the un-needed copy.
The misc usage of vtable->vaMapBuffer and vtable->vaMapBuffer2 within vaMapBuffer vaMapBuffer2 APIs may be verbose. For vaMapBuffer2 API, if driver does not provide the virtual function point, we can return VA_STATUS_ERROR_UNIMPLEMENTED directly. The app user should consider the fallback cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future, I suppose, somedriver will just implement vaMapBuffer2, to be compatible with old app, need vaMapBuffer also be supported. this why I add similar fallback logic in vaMapBuffer and vaMapBuffer2

@nowrep
Copy link

nowrep commented Oct 7, 2023

What's the status of this? It would be great to have it merged.
Currently the issue in Mesa is that mapping all buffers read+write leads to bad performance on dGPUs.

@dvrogozh
Copy link
Contributor

dvrogozh commented Oct 7, 2023

If I recall correctly, it was not merged because of lack of implementation on the (intel) driver side. However, considering that new driver appeared (vaon12) and obviously it has interest in this feature, I think we need to go forward with it. PR needs to be rebased and conflicts which appeared while it was hanging around resolved (@XinfengZhang : can you, please, do that? or I can pick it up). And PR needs re-review.

@nowrep : you are looking for this PR in regards of mesa's vaon12?

@nowrep
Copy link

nowrep commented Oct 7, 2023

That sounds good, thanks!

My interest for this PR is radeonsi but it will benefit all Mesa drivers, so that also includes vaon12.

@dvrogozh
Copy link
Contributor

dvrogozh commented Oct 7, 2023

Is there any discussion on mesa side regarding this? Maybe some issue filed?

@nowrep
Copy link

nowrep commented Oct 7, 2023

This PR came up in discussion of this MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25597

Context: Mesa always mapped the buffer as write-only. Up until recently vaDeriveImage was not implemented for multi-planar formats on radeonsi, so applications would fallback to vaPut/GetImage. That's not the case anymore and it is now an issue when applications try to read from the mapped memory. The simple fix was to map the buffer as read+write, but it regressed performance when used for writing only so the change was reverted.

add new vaMapBuffer2
backend driver need some usage hint to optimize the operation

Signed-off-by: Carl Zhang <[email protected]>
@XinfengZhang XinfengZhang force-pushed the map2 branch 2 times, most recently from 15f6459 to 8f9ec28 Compare October 9, 2023 07:52
Copy link
Contributor

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still looks good

/** \brief VA_MAPBUFFER_FLAG_DEFAULT is used when there are no flag specified
* same as VA_MAPBUFFER_FLAG_READ | VA_MAPBUFFER_FLAG_WRITE.
*/
#define VA_MAPBUFFER_FLAG_DEFAULT 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having explicitly said what's the behavior in case of naive default value (=0) is good. I would leave DEFAULT declared.

@dvrogozh
Copy link
Contributor

dvrogozh commented Oct 9, 2023

@nowrep, @sivileri : we are ready to merge in this libva change as soon as some driver(s) will raise a PR on their side to use this new VAAPI feature. Can you, please, help to prepare a change on mesa side and get it tried out?

@nowrep
Copy link

nowrep commented Oct 9, 2023

@dvrogozh
Copy link
Contributor

Thank you @nowrep for quick follow up. Quoting perf example from https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25620#note_2119488:

"Example with patched ffmpeg to use vaMapBuffer2:

ffmpeg -vaapi_device /dev/dri/renderD128 -f lavfi -i testsrc=size=1280x720 -vf format=nv12,hwupload -c:v h264_vaapi -async_depth 1 -f null -

Before (read+write): 540fps
After  (write only): 620fps

It's around 10-15% faster depending on the resolution. At 2048x2048 it's 160fps before and 180fps after."

@dvrogozh
Copy link
Contributor

I don't see a reason to hold it on any further. Merging in.

@dvrogozh dvrogozh merged commit 45afd79 into intel:master Oct 10, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants