Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random crashes of python in latest dev version of Pillow with pillow_heif #8439

Closed
bigcat88 opened this issue Oct 5, 2024 · 12 comments
Closed

Comments

@bigcat88
Copy link
Contributor

bigcat88 commented Oct 5, 2024

Good day, everyone.

I'll try to describe the situation briefly.

First, I want to clarify that I'm not certain this is a bug in Pillow. I created this issue so that knowledgeable contributors could help identify any recent changes in the repository that might be causing the problem I'm experiencing.

About two weeks ago, when using the development version of Pillow, some actions in pillow_heif repo started to fail: run link.

Now that I have some time to investigate, I'm encountering a situation where the root cause of the error is still unclear to me.

Here is the script I'm using for testing:

import os
import pillow_heif
from pathlib import Path
from PIL import Image, ImageSequence

pillow_heif.register_heif_opener()

os.chdir(Path(__file__).parent.parent.joinpath("tests/images/heif/"))

if __name__ == "__main__":
    im = Image.open("zPug_3.heic")
    im.load()
    for frame in ImageSequence.Iterator(im):
        print("before frame.load()")
        frame.load()
        print("after frame.load()")
        assert len(frame.tobytes()) > 0
        print("after frame.tobytes()")
    print("exit")

It can crash, or sometimes it does not crash.
Sometimes I even see "exit" in output and then crash occurs.

I'm running this script from the cloned pillow_heif repository, creating a "dev" folder at the root (if needed for quick reproduction) and placing it there.

The issue is that my Python interpreter randomly crashes when using the development version of Pillow, and I haven't been able to pinpoint the exact location of the crash.

Nothing has changed in pillow_heif itself in last months (I'm waiting for HDR image support before refactoring the code).

I've tried running the script with other versions of Pillow, and everything works fine. I've also tried adding print statements inside Pillow to locate the failure, but the crash seems random, and I haven't been able to identify the exact spot where it occurs.

I'm testing this on macOS (but can also reproduce it on Linux). Any insights or tips on where to investigate, or information on what may have changed in the Pillow repo 2–3 weeks ago that could lead to these crashes, would be greatly appreciated.

Some info that I have found:

1ee3bd1 - on this commit crash already occurs
08d9c89 - repo at this commit is good

Thank you!

@wiredfool
Copy link
Member

This sort of thing sounds like a memory safety issue. It's possible that just running to crash under gdb would be enough to tell you where the crash is happening, and that should be enough to start to narrow down the issue. Alternately, you can run under valgrind on linux (but that's slow).

If it's a consistent failure, the other option is to use git bisect and narrow down to the commit -- though since we don't squash commits on main, there's a chance of having non-working commits unrelated to this at any given point.

@hugovk
Copy link
Member

hugovk commented Oct 5, 2024

git bisect points to 84e275d from PR #8390:

84e275d90629bfdf508f81a1142eddd5d5ea46e4 is the first bad commit
commit 84e275d90629bfdf508f81a1142eddd5d5ea46e4
Date:   Wed Sep 18 20:27:35 2024 +1000

    Loading does not change mode

 src/PIL/ImageFile.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

@bigcat88
Copy link
Contributor Author

bigcat88 commented Oct 5, 2024

additional information that I may have forgotten to mention is that zPug_3.heic is three images (or frames) in one file:

<pillow_heif.as_plugin.HeifImageFile image mode=RGB size=64x64>
<pillow_heif.as_plugin.HeifImageFile image mode=L size=64x64>
<pillow_heif.as_plugin.HeifImageFile image mode=RGB size=96x64>

git bisect points to 84e275d from PR #8390:

thanks, very good finding

@bigcat88
Copy link
Contributor Author

bigcat88 commented Oct 5, 2024

It turns out that when the second frame has a different mode, a new Image.core.new is not created after that commit, and the mode does not change, therefore, in pillow_heif, the wrong stride is calculated based on wrong mode(RGB vs L), and the libde265 decoder crashes as it expects more data.

Edit: the reason is correct(wrong mode for Image.core), and it crashes here:

s = d.decode(data)

Can we revert this commit 84e275d ?

@wiredfool
Copy link
Member

History says that this hasn't changed since the PIL fork, except for the most recent batch of changes. I can see where we could get multi image container formats (gif or webpanim or tiff) where the frame type could change, esp something like rgb->rgba, though our internal decoders might handle that directly. Obviously at least one external decoder has been relying on this.

I'd say revert, but finding a good test case for this might be the hardest bit.

@radarhere
Copy link
Member

radarhere commented Oct 6, 2024

My commit was inspired by @homm's comment that he considered it a bug when an image changed size after it was loaded.

I think load() is a method that many users are probably not even aware of, and so they might the behaviour unexpected if it is internally called and the mode changes.

from PIL import Image
im = Image.open(...)
print(im.mode)  # L
print(im.copy().mode)  # RGB

When I made the mode change, none of our internal plugins had to be updated to handle it. GifImagePlugin changes modes between frames, sure, but the the core image can be reset before loading, in its own load_prepare(). TiffImagePlugin creates a new core image at the end of seek(). It doesn't look like WebP does change modes between frames.

@bigcat88 a few questions - do you disagree with my idea in theory? Do you think pillow_heif could be updated handle this new way of doing things? Is the problem simply that this change was made suddenly, and you would prefer a deprecation?

@bigcat88
Copy link
Contributor Author

bigcat88 commented Oct 6, 2024

When I made the mode change, none of our internal plugins had to be updated to handle it.

I don't know how common it is that frames have different modes.
I know that in modern formats like AVIF or HEIF this is supported, and this can happen.
But I do never see real world examples of this, and not custom crafted images

@bigcat88 a few questions - do you disagree with my idea in theory? Do you think pillow_heif could be updated handle this new way of doing things? Is the problem simply that this change was made suddenly, and you would prefer a deprecation?

I don't mind updating "pillow_heif", that's why I created issue to determine the way to do it, knowing that here everyone can always get an answer to the question of what is best

TiffImagePlugin creates a new core image at the end of seek()

After receiving a hint that TiffImagePlugin does this in the seek method, in theory it should be done the same way.

Like this:

    def seek(self, frame: int):
        if not self._seek_check(frame):
            return
        self.__frame = frame
        self._init_from_heif_file(frame)  # this fills  `self._size` and `self._mode` with new frame info

        if pil_version[:3] != "10.":
            # Pillow 11.0+
            # We need to create a new core image object on second and
            # subsequent frames in the image. Image may be different size/mode.
            # https://github.com/python-pillow/Pillow/issues/8439
            self.im = Image.core.new(self._mode, self._size)

        _exif = getattr(self, "_exif", None)  # Pillow 9.2+ do no reload exif between frames.
        if _exif is not None and getattr(_exif, "_loaded", None):
            _exif._loaded = False  # pylint: disable=protected-access

It seems to work and tests passed successfully, many thanks to everyone for help.

I think the issue can be closed?

@homm
Copy link
Member

homm commented Oct 7, 2024

I still think it's better to revert this, especially now, when we realise not only TiffImagePlugin relies on this behaviour.
https://github.com/python-pillow/Pillow/pull/8390/files#r1764864935

@radarhere
Copy link
Member

Reverted and deprecated?

I think the ideal behaviour when changing modes between frames is actually to set self._im = None at the end of seek - #8392. I think load() shouldn't change the image mode, and in theory at least, load() doesn't have to be called after seek(), so if there's a core image with the wrong mode hanging around, it can be discarded.

@homm
Copy link
Member

homm commented Oct 8, 2024

is actually to set self._im = None

This will lead to unnecessary memory allocation if all frames have the same size and mode. The previous implementation of load_prepare was straightforward: it prepared self.im with the correct mode and size, so there was no need to worry about handling self.im in the plugins.

I think load() shouldn't change the image mode

I fully agree, just like with size changes, which I also tried to fix, as you might remember. But I don't quite understand how this would happen if we return to the previous load_prepare behavior.

load() doesn't have to be called after seek()

Possibly. But I'm specifically talking about load_prepare, which I believe is the correct API for managing the self.im object.

@radarhere
Copy link
Member

Returning to the previous load_prepare behaviour wouldn't cause a problem, no. I was sort of trying to enforce the core image to either be None or the correct mode/size before calling load(), to allow for the core image to be dropped if it theoretically became unnecessary. However, once I take on board your comment about unnecessary memory allocation, that's hard to do simply. I've created #8450 to try - at the end of seek(), an external plugin could call super().seek(), and if the mode or size has changed, then self._im is set to None.

If you're not a fan of that suggestion though, then I'm happy to just revert the load_prepare() change.

@homm
Copy link
Member

homm commented Oct 8, 2024

I was sort of trying to enforce the core image to either be None or the correct mode/size before calling load()

Good point. I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants