Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Free(): invalid pointer" - during BOPWriter #1099

Closed
matteomastrogiuseppe opened this issue May 23, 2024 · 21 comments
Closed

"Free(): invalid pointer" - during BOPWriter #1099

matteomastrogiuseppe opened this issue May 23, 2024 · 21 comments
Labels
question Question, not yet a bug ;)

Comments

@matteomastrogiuseppe
Copy link

Describe the issue

I found this issue trying to render some images through the bproc.writer.write_bop method on two Ubuntu native machines (22.04). After the rendering, the code breaks and "Free(): invalid pointer" is printed indefinitely.

I tracked the source of the error in https://github.com/DLR-RM/BlenderProc/blob/main/blenderproc/python/writer/BopWriterUtility.py, the error pops up at line 538, when a PyRender object is instantiated.

I think it may be an OpenGL problem, that pops up in combination when using multiprocessing.
It may even be correlated to the issue I encountered a while ago (#1084), since I feel like memory is still not handled in a correct way.

A minimal code to reproduce the issue is by simply following the steps at https://github.com/DLR-RM/BlenderProc/tree/main/examples/datasets/bop_object_physics_positioning.

Minimal code example

https://github.com/DLR-RM/BlenderProc/tree/main/examples/datasets/bop_object_physics_positioning

Files required to run the code

No response

Expected behavior

The code fails during the generation of the BOP data, most specifically when creating pyrender renderer objects.

BlenderProc version

2.7.0

@matteomastrogiuseppe matteomastrogiuseppe added the question Question, not yet a bug ;) label May 23, 2024
@cornerfarmer
Copy link
Member

cornerfarmer commented May 24, 2024

Thanks for the hint, @matteomastrogiuseppe! I am currently investigating a few strange issue that occur in the multiprocessing the bop writer.
Just to make sure:

  • the error occurs when running the original bop_object_physics_positioning example?
  • the error occurs each time you run on the first bopwriter call?
  • Are you using 2.7.0 or 2.7.1?

@matteomastrogiuseppe
Copy link
Author

  • Yes, also in other scripts whenever I use the .write_bop method.
  • Yes, this issue occurs immediately with the first bopwriter call.
  • 2.7.0, but issue is still there with 2.7,1. Actually a rmdir: No such file or directory pops up together with free(): invalid pointer, if that can be of any help.

@matteomastrogiuseppe
Copy link
Author

matteomastrogiuseppe commented May 24, 2024

Also, this issue occurs only when trying to render with the GPU. If I set bproc.renderer.set_render_devices(use_only_cpu=True), then everything goes smoothly.

@cornerfarmer
Copy link
Member

Thanks! I guess with bproc 2.6.2 (does not use multiprocessing in bop writer) everything would also work right?

The problem is a bit that I cannot reproduce this issue on my system. Whiche GPU do you use?

@matteomastrogiuseppe
Copy link
Author

With BProc 2.6.2 I still get the free(): invalid pointer issue, but it's only displayed once, and then the code exits, instead of printing the issue endlessly. Once again, the error is issued when the pyrender.OffscreenRenderer is instantiated.

Using a NVIDIA RTX 2080 at the moment, but I got the same issue with another integrated NVIDIA GPU. I'm using a conda environment, but I don't think it's related to this case.

@matteomastrogiuseppe
Copy link
Author

matteomastrogiuseppe commented May 27, 2024

Recreated the problem on another machine, this one has returned a more explanatory error log. It should be the same issue but can't say 100%. I get this message printed indefinitely (so for every pool worker instantiated):

Process ForkPoolWorker-XX: (XX is the number of the process)
Traceback (most recent call last):
  File "/home/.../blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/.../blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/.../blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/pool.py", line 109, in worker
    initializer(*initargs)
  File "/home/.../miniforge3/envs/staliani_bproc/lib/python3.9/site-packages/blenderproc/python/writer/BopWriterUtility.py", line 538, in _pyrender_init
    renderer = pyrender.OffscreenRenderer(viewport_width=ren_width, viewport_height=ren_height)
  File "/home/.../blender/blender-3.5.1-linux-x64/custom-python-packages/lib/python3.10/site-packages/pyrender/offscreen.py", line 31, in __init__
    self._create()
  File "/home/.../blender/blender-3.5.1-linux-x64/custom-python-packages/lib/python3.10/site-packages/pyrender/offscreen.py", line 149, in _create
    self._platform.init_context()
  File "/home/.../blender/blender-3.5.1-linux-x64/custom-python-packages/lib/python3.10/site-packages/pyrender/platforms/egl.py", line 177, in init_context
    assert eglInitialize(self._egl_display, major, minor)
  File "/home/.../blender/blender-3.5.1-linux-x64/custom-python-packages/lib/python3.10/site-packages/OpenGL/platform/baseplatform.py", line 402, in __call__
    return self( *args, **named )
  File "/home/.../blender/blender-3.5.1-linux-x64/custom-python-packages/lib/python3.10/site-packages/OpenGL/error.py", line 228, in glCheckError
    raise GLError(
OpenGL.error.GLError: GLError(
        err = 12289,
        baseOperation = eglInitialize,
        cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7f77243ff5c0>,
                c_long(0),
                c_long(0),
        ),
        result = 0
)

cc: @traversaro

@cornerfarmer
Copy link
Member

Thanks for the update @matteomastrogiuseppe. So it seems the error is not related to the multiprocessing but more to the pyrenderer/OpenGL. Are you able to instantiate the OffscreenRenderer outside of blender?

As you could reproduce the error on multiple system, but I could not reproduce it on our systems, there seems to be something different between our environments.
How is your env defined? Is it a conda env?

@matteomastrogiuseppe
Copy link
Author

I am using a very simple conda environment.

Environment.yaml:
name: bproctest
channels:
  - conda-forge
dependencies:
  - python
  - pip 
  - pip:
    - blenderproc

Then of course, on the very first blenderproc run, it will download a bunch of dependencies.

Funny enough, I tried to define (and not use) a random pyrender.OffscreenRenderer in the main code, before the writer.write_bop. This does not throw an error immediately.
The error log shifts from a infinite "free(): invalid pointer" to:

Error: Python: Traceback (most recent call last):
  File "/home/icub/code/project-jl2-camozzi/jl2_workspace/src/blenderproc/code/ciao.py", line 135, in <module>
    bproc.writer.write_bop(os.path.join(args.output_dir, 'bop_data'),
  File "/home/icub/miniforge3/envs/bproctest/lib/python3.11/site-packages/blenderproc/python/writer/BopWriterUtility.py", line 177, in write_bop
    _BopWriterUtility.calc_gt_info(chunk_dirs=chunk_dirs, starting_frame_id=starting_frame_id,
  File "/home/icub/miniforge3/envs/bproctest/lib/python3.11/site-packages/blenderproc/python/writer/BopWriterUtility.py", line 837, in calc_gt_info
    scene_gt_info[im_id] = pool.map(partial(_BopWriterUtility._calc_gt_info_iteration, annotation_scale, ren_cy_offset, ren_cx_offset, im_height, im_width, K, delta, depth), scene_gt[im_id])
  File "/home/icub/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/icub/blender/blender-3.5.1-linux-x64/3.5/python/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f853370bc40>'. Reason: 'ValueError('ctypes objects containing pointers cannot be pickled')'

Which is after the pyrender initialization.

@traversaro
Copy link

traversaro commented May 27, 2024

Just to check if this is indeed and EGL error, can you try if this happens by setting https://github.com/DLR-RM/BlenderProc/blob/main/blenderproc/python/writer/BopWriterUtility.py#L28 to os.environ['PYOPENGL_PLATFORM'] = 'osmesa' ? This would use CPU rendering that is much slower, but at least is a good indication of what is going wrong if with osmesa it works fine.

@traversaro
Copy link

To interpret a bit better:

OpenGL.error.GLError: GLError(
        err = 12289,
        baseOperation = eglInitialize,
        cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7f77243ff5c0>,
                c_long(0),
                c_long(0),
        ),
        result = 0

12289 in hex is 0x3001, i.e. EGL_NOT_INITIALIZED. pyrender has a bunch of related issues, see https://github.com/search?q=repo%3Ammatl%2Fpyrender+12289&type=issues . Which version of pyrender and pyopengl is used in /home/.../blender/blender-3.5.1-linux-x64/custom-python-packages/ ?

@traversaro
Copy link

More in general perhaps we should call .delete() after the renderer has been used, as documented in https://pyrender.readthedocs.io/en/latest/examples/offscreen.html#running-the-renderer ? @matteomastrogiuseppe in your example it seems that Python 3.11 and 3.10 modules are mixed, perhaps that is part of the problem?

@cornerfarmer
Copy link
Member

Thanks for the investigation!
We use pyrender==0.1.45 in blenderproc (see https://github.com/DLR-RM/BlenderProc/blob/main/blenderproc/python/utility/DefaultConfig.py#L46). Maybe the version is too old?

@traversaro
Copy link

traversaro commented May 27, 2024

We use pyrender==0.1.45 in blenderproc (see https://github.com/DLR-RM/BlenderProc/blob/main/blenderproc/python/utility/DefaultConfig.py#L46). Maybe the version is too old?

What about pyopengl ? If I recall correctly, pyrender (that I am afraid is currently unmantained, in another project we migrated to https://github.com/fishbotics/pyribbit explicitly to avoid the pyopengl==3.1.0 pin) was pinning pyopengl==3.1.0, that however does not have an important bugfix, i.e. in egl loads the legacy libGL.so instead of the modern glvnd-based libOpenGL.so, see mcfletch/pyopengl#27 for background and the code in :

@cornerfarmer
Copy link
Member

I was not aware that pyrender is not maintained anymore. So if you say, switching to pyribbit would solve the error, we can do that in the next bproc version (although pyribbit is also not that well maintained...).

@traversaro
Copy link

I was not aware that pyrender is not maintained anymore.

I never found any kind of announcement, but the maintainers have stopped replying to any issue or PR, so that's my assumption. See mmatl/pyrender#224 and mmatl/urdfpy#31 for a related discussion (urdfpy and pyrender have the same mantainer).

So if you say, switching to pyribbit would solve the error, we can do that in the next bproc version (although pyribbit is also not that well maintained...).

Just to totally clear I am not sure if that will fix that error, that was just an hypothesis. I would first double check that (I can align with @matteomastrogiuseppe).

(although pyribbit is also not that well maintained...).

In which sense? Indeed the docs website are not regenerated/polished and still point to the pyrender one, but I am not aware of any bug present in pyribbit.

@matteomastrogiuseppe
Copy link
Author

Sorry for the late reply, but the error disappeared on one machine, where @traversaro performed a deep-cleaning of the NVIDIA drivers and then re-installed them with sudo ubuntu-drivers install.

I tried to do the same on another machine (where I used to get this error), and it solved the issue once again.

Next time I won't suppose that multiple computers are ground truth, sorry😁 I guess we can close this!

@traversaro
Copy link

Sorry for the late reply, but the error disappeared on one machine, where @traversaro performed a deep-cleaning of the NVIDIA drivers and then re-installed them with sudo ubuntu-drivers install.

Great! No need to worry about dependencies then.

To comment a bit more on the problem on that machine: there no libEGL_nvidia.so file was installed by an earlier NVIDIA driver install with the Nvidia .run, uninstalling the driver via <driverfile>.run --uninstall and installing the drivers with sudo ubuntu-drivers install everything worked fine.

@cornerfarmer
Copy link
Member

Great that it works now! Thanks for the explanation.

Regarding pyrender: I will keep that in mind and will then probably switch to pyribbit in one of the next versions.

In which sense?

This was only based on the fact that the repo contains only one more commit compared to the original repo. But if there are no further issue, its of course fine.

@prrw
Copy link

prrw commented Jun 6, 2024

I am getting this error as well. I am using 3 x NVIDIA RTX 6000 Ada. I assume this will be solved in the next versions.

@michbaum
Copy link

If anyone is still suffering from this, what worked on my end (as a temporary fix) was simply importing the following on the top of my blenderproc script:

import pyrender
from pyrender.platforms import egl

To my understanding, this negates an import within a function down the line that leads to the error. Couldn't determine the exact cause though

@eclipse0922
Copy link

Same problem occured on my env.

I downgraded conda env python 3.10 to 3.9. then problem solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question, not yet a bug ;)
Projects
None yet
Development

No branches or pull requests

6 participants