-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rasterio / GDALOpenEx performance slowdown in loops #11164
Comments
I've tried with rasterio at commit 7d1409318433934660cf98fef028e9097dc2ee45 and GDAL master and can't really a slow down. The timings you provide aren't super conclusive either about a slowdown (on my own machine this is quite noisy) |
Thank you for the quick reply Even. I retested from same rasterio commit as you, and noticed that 3.8.5 seems to have no slowdown, 3.9.0 have a bit of slowdown actually (starts out around 40ms per iteration and is at 50ms at the end of the 1000 iterations). With 3.9.1 it starts at around 40ms per iteration and at the end is above 120ms per iteration. Setting GDAL_DISABLE_READDIR_ON_OPEN to TRUE or EMPTY_DIR makes no difference. It should be noted that I'm using GDAL from conda-forge for all the tested versions, and rasterio from the Github repo. I debugged / tested by adding timing to the rasterio Cython code for the files and lines i mentioned, which is how i found out that it seems to be the call to GDALOpenEx that is the cause of the slowdown. It seems like the debugger for Cython is a pain in the *** to get working, so I haven't delved deeper. |
and what about adding a |
Same slowdown 🤔 This is the updated test script for reference: import os
os.environ["GDAL_DISABLE_READDIR_ON_OPEN"] = "TRUE"
import rasterio
import numpy as np
import time
import shutil
dest_np = np.random.rand(1000, 1000)
crs = "EPSG:32633"
dest_transform = rasterio.transform.from_origin(0, 0, 1, 1)
save_dir = "test_tiffs"
if os.path.exists(save_dir):
shutil.rmtree(save_dir)
os.makedirs(save_dir, exist_ok=True)
with rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN="TRUE"):
for i in range(1000):
img_save_path = f"{save_dir}/test_{i}.tif"
start_time = time.time()
with rasterio.open(img_save_path, mode="w", driver="GTiff", compress="LZW", dtype=rasterio.float32,
crs=rasterio.crs.CRS.from_string(crs), blockysize=256, transform=dest_transform,
width=dest_np.shape[1], height=dest_np.shape[0],
count=1) as dataset_out:
dataset_out.write(dest_np, 1)
end_time = time.time()
os.unlink(img_save_path)
print(f"Time to save: {end_time - start_time}s") I don't know if it would make sense to create an issue in the rasterio repo? I'm just assuming since the only change between the tests are different GDAL target versions when building rasterio, that the issue could be in GDAL. FYI this is being tested on Windows 11 PC with Ryzen 5900X and NVME SSD. |
My tests are done on Linux Ubuntu 20.04 | I don't know if it would make sense to create an issue in the rasterio repo? Maybe... ? If you are in a position to build GDAL from source, then doing a "git bisect" session could help a lot spotting the actual change that introduced the issue. |
@EmilRybergAkson It won't be useful to open a Rasterio issue. I don't access to a Windows computer and won't be able to reproduce this any more than @rouault can. |
Seems like the same problem I'm seeing. Good idea to plot it 😉 In the real use case it is processing 5k+ images that is later mosaiced into one orthophoto, so storing in RAM is not really feasible. We have had the same code running fine with 3.7.3, but upgrading to 3.9.1+ (3.9.3) we are encountering the issue. @rouault I just finished testing with compiling GDAL from tag 3.9.1 + 3.9.3, with only Proj as dependency, and using that to compile rasterio. The problem is non-existent with self compiled version of GDAL, so I guess it could be some weird issue with the GDAL versions on conda-forge. I will do some more testing on another Windows PC to see if the problem exists there. |
Could that possibly be related to the modularized GDAL builds on conda-forge that have started with GDAL 3.9.1 ... ? Do you have libgdal-XXXX conda packages installed ? If so perhaps try to uninstall them in case there would be a culprit among them... ? |
Why I suggested using memory output for debugging is to find out what is the slow part. Now Even and Howard have not seen that issue on Linux but you and me have found it on Windows. Maybe the reason is somehow related to the Windows file system. How to test the hypothesis? Remove the file system from the play and write into memory. Doesn't it feel worth having a try? Do I see or just imagine that there are some steps in the graph? For sure there are some much slower executions, what happens then? If this was Java I would think that they could show some carbage collection checkpoints. Notice that my graph is from OSGeo4W build, not from conda. |
I had uninstalled the libgdal-core (only one installed) in the conda environment where i tested with local build of GDAL. In my other conda environment where i tested local rasterio install + conda-forge gdal only has libgdal-core. I did some profiling with rasterio_with_gdal_conda_forge.speedscope.json You can open the files with https://www.speedscope.app/ (in browser) |
Changing the provided minimum reproducible example to save to the same file instead of a new file each iteration, the slowdown does not occur. Is there perhaps something keeping track of each unique file path that is iterated for each dataset open causing the slowdown? |
I couldn't make sense of the profile file. Can you try disabling drivers with GDAL_SKIP=foo,bar,baz where foo / bar /baz are driver short name to see if one driver might be the culprit ? The list keeping only the GTiff driver can be obtained with:
|
I realised there are stack frames missing from the profiling files, so they are not entirely accurate. I tried compiling rasterio with debugging info, but it does not seem to work with py-spy, so I'm a bit clueless on how to debug it further... Using GDAL_SKIP to only use GTiff driver doesn't fix the problem. It is a bit hard to debug when i don't have the PDB files for the build on conda-forge, and I can't replicate it with local minimal build of GDAL. But using Visual Studio profiler it also shows calls to tiff.dll that are slow. As far as I can see in the documentation tiff is included in the GDAL repo itself, so why would it call into tiff.dll in conda-forge GDAL and not my local build? |
You need to check in your GDAL's CMakeLists.txt the value of the GDAL_USE_TIFF_INTERNAL variable |
Building using external tiff, using the same tiff.dll that comes from conda-forge build does not reproduce the problem with local GDAL build. Where can i see how gdal and libgdal-core is built for conda-forge? Hopefully if I can replicate my local build to match the conda-forge build, I can reproduce and properly debug the culprit. |
GDAL: https://github.com/conda-forge/gdal-feedstock/blob/main/recipe/bld.bat and https://github.com/conda-forge/gdal-feedstock/blob/main/recipe/meta.yaml |
I have compiled GDAL using the same CMake flags in the bld.bat, but it still works fine with my locally built GDAL dll, but if i switch it out for the one from conda-forge the problem is there, i am using the 3rd party DLLs shipped with libcore-gdal from conda-forge to build GDAL locally, to keep that the same. I am building from tag v.3.9.3 FYI, and using libgdal-core 3.9.3 from conda-forge. Any idea to what could be different yet? |
What is the bug?
Using rasterio 1.3.11 compiled against GDAL 3.9.1 and later results in slowdown when creating datasets in a loop. It does not seem to be an issue in 3.9.0 and earlier. I have tried debugging to figure out what the problem is, but I have not been able to pinpoint it exactly, and what i have found has confused me even more...
The problem is line 317 in rasterio
_io.pyx
in the_delete_dataset_if_exists
which is called when a dataset is opened in "w" mode. The call toopen_dataset
is what slows down, specificallyGDALOpenEx
in_base.pyx
line 219.In some mock code i wrote, calling GDALOpenEx in both C++ and Python in a loop, the problem does not seem to occur. As far as I can tell there have also not been any changes to GDALOpenEx between 3.9.0 and 3.9.1, but could there be some error handling or other overhead that rasterio uses that is the root problem?
Steps to reproduce the issue
Running the following test code should show the problem:
With versions including and after GDAL 3.9.1 it should gradually become slower each iteration. This is not the case with GDAL 3.9.0 and earlier.
Sample from my PC with GDAL 3.9.1:
Versions and provenance
Python 3.12, but problem is also present in other Python versions (tested on 3.9 also)
Windows 11. Rasterio version 1.3.11. Tested GDAL versions:
3.7.3 - Works
3.8.5 - Works
3.9.0 - Works
3.9.1 - Slowdown
3.9.3 - Slowdown
Additional context
No response
The text was updated successfully, but these errors were encountered: