Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap corruption with format conversion #715

Closed
eero-t opened this issue Aug 23, 2019 · 8 comments · May be fixed by #732
Closed

Heap corruption with format conversion #715

eero-t opened this issue Aug 23, 2019 · 8 comments · May be fixed by #732
Assignees
Labels
verifying PR: fix ready and verifying with build/test

Comments

@eero-t
Copy link

eero-t commented Aug 23, 2019

With drm-tip kernel and media stack built from git commit from few days ago, format conversion crashes.

Setup:

  • HW: SKL i5-6600K
  • OS: Ubuntu 18.04 + updates
  • Media stack:
    • kernel git://anongit.freedesktop.org/drm-tip at 37907940079eb7587087896090105abbc5b27611 2019-08-21_16-44-48 drm-tip: 2019y-08m-21d-16h-43m-51s UTC integration manifest
    • libdrm git://anongit.freedesktop.org/git/mesa/drm at 14922551aa33e7592d2421cc89cf20a860a65310 2019-08-08_17-20-27 amdgpu: add umc ras inject test configuration
    • libva git://github.com/intel/libva at 9542f7ba9e88b04b4c1a5d7d71ba686132b7313c 2019-08-19_01-08-53 trace: fix memory leak on closing the trace
    • media-driver git://github.com/intel/media-driver at 2caed01 2019-08-21_10-12-45 [VP] Added HDR PreProcess Kernel Binary and updated Preprocess Kernel to v20190820.
    • ffmpeg https://github.com/FFmpeg/FFmpeg.git at aeae6283a9d5c84c1752bd270838ad738fff3d45 2019-08-21_11-53-33 avfilter/vf_v360: remove unused header

Use-case:
ffmpeg -y -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i 1280x720p_29.97_10mb_h264_cabac.264 -vf 'hwdownload,format=uyvy422' -c:v rawvideo -f rawvideo /dev/null

Crashes to:

double free or corruption (!prev)
Aborted

Depending on input video, libc doesn't always catch the heap corruption like above, and FFmpeg segfaults instead.

Valgrind shows huge number of invalid memory accesses ending with heap corruption in media driver / libgmm:

==8762== Invalid write of size 8
==8762==    at 0xF84A8B6: _mm_storeu_si128 (emmintrin.h:721)
==8762==    by 0xF84A8B6: CpuSwizzleBlt (CpuSwizzleBlt.c:1055)
==8762==    by 0xF834D0B: GmmLib::GmmResourceInfoCommon::CpuBlt(GMM_RES_COPY_BLT_REC*) [clone .part.12] (GmmResourceInfoCommon.cpp:1491)
==8762==    by 0xDC5DF34: SwizzleSurface (media_libva.cpp:4469)
==8762==    by 0xDC5DF34: DdiMedia_GetImage(VADriverContext*, unsigned int, int, int, unsigned int, unsigned int, unsigned int) (media_libva.cpp:4619)
==8762==    by 0x679F36C: vaGetImage (va.c:1753)
==8762==    by 0xFE4032: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0xFDF492: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x36AD86: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2D8FDB: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2DD897: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2AF42C: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2AFB80: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2B0633: ??? (in /opt/install/bin/ffmpeg)
==8762==  Address 0x15426040 is 0 bytes after a block of size 1,851,392 alloc'd
==8762==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8762==    by 0xDA55912: MOS_AllocAndZeroMemory (mos_utilities.c:4020)
==8762==    by 0xDC6C3FE: DdiMediaUtil_CreateBuffer(_DDI_MEDIA_BUFFER*, mos_bufmgr*) (media_libva_util.cpp:684)
==8762==    by 0xDC58A59: DdiMedia_CreateImage(VADriverContext*, _VAImageFormat*, int, int, _VAImage*) (media_libva.cpp:4016)
==8762==    by 0x679F20D: vaCreateImage (va.c:1695)
==8762==    by 0xFE4005: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0xFDF492: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x36AD86: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2D8FDB: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2DD897: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2AF42C: ??? (in /opt/install/bin/ffmpeg)
==8762==    by 0x2AFB80: ??? (in /opt/install/bin/ffmpeg)
==8762== 

valgrind: m_mallocfree.c:307 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 1851456, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.
@XinfengZhang
Copy link
Contributor

@eero-t thanks report. will check it.

@XinfengZhang XinfengZhang self-assigned this Sep 3, 2019
XinfengZhang added a commit to XinfengZhang/media-driver that referenced this issue Sep 22, 2019
fixes intel#715
the swizzleSurface function will call GmmLib::GmmResourceInfoCommon::CpuBlt
and CpuSwizzleBlt, in this function _mm_storeu_si128  will write some invalid memory
it is not rootcause, jus a finding. should not merged until real rootcause is found

Signed-off-by: XinfengZhang <[email protected]>
@XinfengZhang
Copy link
Contributor

@eero-t , could you help to try #732 , to check whether the issue still there. it is not final solution. just some hint

@eero-t
Copy link
Author

eero-t commented Sep 23, 2019

@eero-t , could you help to try #732 , to check whether the issue still there. it is not final solution. just some hint

Why, were you unable to reproduce the issue?

(As can be seen from the backtrace, it's some kind of issue in SIMD optimized CpuBlt() implementation, either off-by-one, or invalid handling of the unaligned part of the blit. #732 seems to replace that with pixel-by-pixel variant, so it shouldn't have the issue that's in the optimized routine.)

@eero-t
Copy link
Author

eero-t commented Mar 3, 2020

What's the status of this bug? The indicated use-case doesn't crash anymore after #757 fix, but I'm still seeing larger number of Valgrind warnings (although libdrm/gmmlib/media-driver were compiled with valgrind headers present).

@XinfengZhang
Copy link
Contributor

@eero-t , almost all of the violation is because of gpu memory allocation, when we map the buffer, valgrind could not get the real size of the buffer.

@XinfengZhang XinfengZhang added the verifying PR: fix ready and verifying with build/test label Mar 10, 2020
@eero-t
Copy link
Author

eero-t commented Jul 13, 2020

@eero-t , almost all of the violation is because of gpu memory allocation, when we map the buffer, valgrind could not get the real size of the buffer.

Of course it can. You just add Valgrind annotation define for it, like e.g. libdrm does:
https://github.com/freedesktop/mesa-drm/blob/master/intel/intel_bufmgr_gem.c#L1485

Besides annotating allocations, you obviously need to annotate also frees, and tell Valgrind something about the memory area types, like is done in the linked libdrm example.

If driver uses those mappings as a memory pool instead of direct allocations, Valgrind has annotation support for that too, to make memory access checks more accurate with them: https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools

@XinfengZhang
Copy link
Contributor

@eero-t , thanks, to add these annotation, we should include valgrind header file and detect valgrind installation. right?

@eero-t
Copy link
Author

eero-t commented Jul 14, 2020

Adding valgrind headers to the system before build is a user responsibility (on Debian derived distros, this would mean doing just "sudo apt install valgrind-dev").

Media driver needs to:

  • Detect presence of valgrind headers
  • Enabled Valgrind annotations when headers are present
  • Annotate non-standard[1] memory allocations + frees for Valgrind so that Valgrind knows when accesses to these memory areas are valid

Examples for all of these can be found from libdrm.

As I mentioned in #784 (which I think should be re-opened until it's actually fixed), driver already has some annotations, but either:

  • they don't work (driver build doesn't properly detect Valgrind headers precence or enabled annotations),
  • annotations are missing, or
  • driver has memory access bugs that should be fixed

[1] Valgrind already catches syscalls like mmap() and other libraries (like libc & libdrm) already annotate their alloc functions, but Valgrind doesn't know about results of direct i915 ioctls done by media driver (or gmmlib?) until they're properly annotated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
verifying PR: fix ready and verifying with build/test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants