Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add HW assistant copy interface #243

Merged
merged 2 commits into from
Dec 3, 2020
Merged

Conversation

XinfengZhang
Copy link
Contributor

Signed-off-by: XinfengZhang [email protected]

Copy link
Contributor

@fhvwy fhvwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mention of GPUs in the name doesn't seem relevant. Most systems (not Intel) implement video independently of any programmable GPU components, and indeed a GPU may well not be present at all. In those cases I would expect it this sort of thing to be implemented with some kind of separate DMA engine able to read/write to all relevant pieces of memory (this seems possible in the GPU case as well since such a DMA engine may be available there too).

The mention of acceleration isn't obviously right either. The intent is presumably that the copy should be offloaded completely to minimise CPU use, not that it should be faster than just doing it directly on the CPU?

va/va.c Show resolved Hide resolved
va/va.c Outdated Show resolved Hide resolved
va/va.h Outdated Show resolved Hide resolved
va/va_backend.h Outdated Show resolved Hide resolved
va/va.h Outdated Show resolved Hide resolved
va/va.h Outdated Show resolved Hide resolved
va/va.h Outdated Show resolved Hide resolved
@XinfengZhang XinfengZhang changed the title add gpu accelarate copy interface add HW assistant copy interface Oct 16, 2018
@XinfengZhang
Copy link
Contributor Author

@fhvwy hi Mark, I changed it to HW assistant copy. actually, most cases, it is faster than CPU side, for example, copy tile surface to linear memory. offload CPU usage is also an option.

va/va.c Show resolved Hide resolved
@XinfengZhang XinfengZhang force-pushed the gpu_cp branch 2 times, most recently from 84ba876 to 6fcb471 Compare February 25, 2020 08:34
va/va.h Outdated Show resolved Hide resolved
@dmitryermilov
Copy link
Contributor

Hi @XinfengZhang ,

I still have the following proposals :

  1. within dGFX we should consider cross adapter (or at least cross vaDisplay) copy. So probably VACopyObject should include VADisplay for VASurfaceID/VABufferID.
  2. We need an input rectangle and coordinates (a point) at dst surface where to put data.
  3. We need to provide an example how to use this interface for video->system and system->video memory copy. Probably it even won't require interface change. E.g. system memory can be represented by vaSurfaceId with userptr.
  4. We need to consider supporting cases when planes of surfaces in system memory are allocated by different malloc calls (ffmpeg, HB do it).

Without it, I'm afraid the interface won't be as powerful and useful as it can be. Do the proposals make sense to you?

@XinfengZhang
Copy link
Contributor Author

XinfengZhang commented Aug 12, 2020

I still have the following proposals :

  1. within dGFX we should consider cross adapter (or at least cross vaDisplay) copy. So probably VACopyObject should include VADisplay for VASurfaceID/VABufferID.
  2. We need an input rectangle and coordinates (a point) at dst surface where to put data.

it sound like composition? should we call VPP to handle it?

  1. We need to provide an example how to use this interface for video->system and system->video memory copy. Probably it
    even won't require interface change. E.g. system memory can be represented by vaSurfaceId with userptr.

agree, we should add a sample app in https://github.com/intel/libva-utils

  1. We need to consider supporting cases when planes of surfaces in system memory are allocated by different malloc calls (ffmpeg, HB do it).

there are a lot of restriction of systeam memory allocated by application through malloc. such as the start address should be aligned with page (4k)

Without it, I'm afraid the interface won't be as powerful and useful as it can be. Do the proposals make sense to you?

@dmitryermilov
Copy link
Contributor

I still have the following proposals :

  1. within dGFX we should consider cross adapter (or at least cross vaDisplay) copy. So probably VACopyObject should include VADisplay for VASurfaceID/VABufferID.
  2. We need an input rectangle and coordinates (a point) at dst surface where to put data.

it sound like composition? should we call VPP to handle it?

Why VPP? Why composition? Please see how other similar interfaces are defined:
https://docs.microsoft.com/en-us/windows/win32/api/d3d11/nf-d3d11-id3d11devicecontext-copysubresourceregion

  1. We need to consider supporting cases when planes of surfaces in system memory are allocated by different malloc calls (ffmpeg, HB do it).

there are a lot of restriction of systeam memory allocated by application through malloc. such as the start address should be aligned with page (4k)

And ? Do you want to say we won't be able all scenarios. It's okay. But we should handle the scenarios which we can handle. Plus anyway, this're discussing interfaces, not implementation. Interacts doesn't care about alignment,

@XinfengZhang XinfengZhang force-pushed the gpu_cp branch 2 times, most recently from d0535b1 to fe5d9d7 Compare August 25, 2020 08:33
va/va.h Outdated Show resolved Hide resolved
va/va.h Outdated Show resolved Hide resolved
@dmitryermilov dmitryermilov self-requested a review October 28, 2020 20:59
@XinfengZhang XinfengZhang force-pushed the gpu_cp branch 2 times, most recently from 26e7dcc to d2c92f5 Compare November 5, 2020 05:30
va/va.h Outdated Show resolved Hide resolved
va/va.h Outdated Show resolved Hide resolved
VACopyObjectType obj_type; // type of object.
union
{
VASurfaceID surface_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ID types goes as:

typedef unsigned int VAGenericID;

which might have different size... Can we add some specific size to this union? Like:

    union
    {
        VASurfaceID surface_id;
        VABufferID  buffer_id;
        uint64_t reserved;
    } object;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to enlarge the size to 64bit? which compiler/platform will lead to different size of unsigned int?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, but I am paranoid:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could consider about the "uint32_t"/"uint64_t" "unsigned long" for future, such as 3.0? , unsigned long will be changed adaptively with the compiler. uint64_t /uint32_t will be dedicate size.

uint32_t reserved :26;
}bits;
uint32_t value;
}VACopyOption;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've also planned to expose vaMapBuffer2 API w/ flags and maybe w/ copy options... See #377 for the early draft.

Can we, please, consider whether we can apply VACopyOption to the vaMapBuffer2 API as well? I worry about va_copy_sync usage... Thought it might still be useful in vase of vaUnmap when we might or might not to wait for the sync...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about va_copy_sync usage... Thought it might still be useful in vase of vaUnmap when we might or might not to wait for the sync...

@dvrogozh , do you mean asynchronous writing to vaSurface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible usages:

vaMapBuffer2(VA_COPY_SYNC_NONBLOCK); // allocates (maybe) shadow copy, no
                              // copy from original surface due to VA_COPY_SYNC_NONBLOCK (we don't care about the contetn)
memcpy();
unmap(); // schedules copy operation form shadow copy to final surface, we don't wait due to VA_COPY_SYNC_NONBLOCK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's async writing to vaSurface/vaBuffer. Above, you say "usages". To tell you the truth I don't see other examples.
But anyway, IMO, it indeed can be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means map is block call , but unmap is un-block call , right? the interface will be a little weird. need to re-consider it. maybe also a unmap2?

@XinfengZhang XinfengZhang force-pushed the gpu_cp branch 2 times, most recently from 03b12f7 to 8b24290 Compare December 3, 2020 05:37
include sync or async. mode

Signed-off-by: Carl Zhang <[email protected]>
Author:Dmitry Rogozhkin <[email protected]>
Author:XinfengZhang <[email protected]>

Signed-off-by: XinfengZhang <[email protected]>
@XinfengZhang XinfengZhang merged commit cd54374 into intel:master Dec 3, 2020
intel-mediadev pushed a commit to intel/media-driver that referenced this pull request Dec 9, 2020
add New libva API vaCopy DDI implementation.
1. only support VACopyObjectSurface copy, VACopyObjectBuffer copy is not ready since it could be handled by CPU.
2. currently, copy sync handle mode is absent, app can call vaSyncSurface to make sure copy complete.
3. linked libva PR: intel/libva#243
4. linked libva-utils sample code PR: intel/libva-utils#203
ZhongXiaoxia pushed a commit to VCDP/media-driver that referenced this pull request Jan 26, 2021
add New libva API vaCopy DDI implementation.
1. only support VACopyObjectSurface copy, VACopyObjectBuffer copy is not ready since it could be handled by CPU.
2. currently, copy sync handle mode is absent, app can call vaSyncSurface to make sure copy complete.
3. linked libva PR: intel/libva#243
4. linked libva-utils sample code PR: intel/libva-utils#203
Shao-Feng pushed a commit to Shao-Feng/media-driver that referenced this pull request Jul 22, 2021
add New libva API vaCopy DDI implementation.
1. only support VACopyObjectSurface copy, VACopyObjectBuffer copy is not ready since it could be handled by CPU.
2. currently, copy sync handle mode is absent, app can call vaSyncSurface to make sure copy complete.
3. linked libva PR: intel/libva#243
4. linked libva-utils sample code PR: intel/libva-utils#203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants