Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The azure-core-cpp 1.11.0 update broke tiledb #228

Closed
jorisvandenbossche opened this issue Jan 13, 2024 · 26 comments
Closed

The azure-core-cpp 1.11.0 update broke tiledb #228

jorisvandenbossche opened this issue Jan 13, 2024 · 26 comments

Comments

@jorisvandenbossche
Copy link
Member

With the recent update of azure-core-cpp to 1.11.0 (conda-forge/azure-core-cpp-feedstock#10), this broke tiledb, or at least through the usage of tiledb in libgdal.

Illustrating the usage of libgdal by reading a file with geopandas, which uses libgdal under the hood, this segfaults with a backtrace that points to tiledb / azure-core-cpp:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff7d2a3fe in free () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7d2a3fe in free () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffacc2236a in Azure::Core::Http::CurlTransport::~CurlTransport() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../././libazure-core.so
#2  0x00007fffacc58092 in Azure::Core::Http::Policies::_internal::TransportPolicy::~TransportPolicy() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../././libazure-core.so
#3  0x00007fffaccf7d5a in std::_Sp_counted_ptr_inplace<Azure::Core::Http::_internal::HttpPipeline, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../././libazure-storage-blobs.so
#4  0x00007fffb03146f7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#5  0x00007fffb04085bd in void tiledb::common::tiledb_delete<Azure::Storage::Blobs::BlobServiceClient>(Azure::Storage::Blobs::BlobServiceClient*) ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#6  0x00007fffb04088cc in tiledb::sm::Azure::~Azure() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#7  0x00007fffb0c46820 in std::_Sp_counted_ptr_inplace<tiledb_ctx_handle_t, tiledb::common::GovernedAllocator<tiledb_ctx_handle_t, tiledb::common::(anonymous namespace)::TiledbTracedAllocator, tiledb::common::Governor>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#8  0x00007fffb0c475c2 in tiledb_ctx_free () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../.././libtiledb.so.2.19
#9  0x00007fffb38016b4 in tiledb::Context::free(tiledb_ctx_handle_t*) ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#10 0x00007fffb38012de in std::_Sp_counted_deleter<tiledb_ctx_handle_t*, void (*)(tiledb_ctx_handle_t*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#11 0x00007fffb3806c8e in TileDBDataset::Identify(GDALOpenInfo*) ()
   from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#12 0x00007fffb3e81246 in GDALOpenEx () from /home/joris/miniconda3/envs/test-geo/lib/python3.11/site-packages/fiona/../../../libgdal.so.34
#13 0x00007fffac571c3f in __pyx_f_5fiona_6ogrext_gdal_open_vector ()

I don't know if tiledb should upper pin the azure-core-cpp dependency, or be re-rendered?

We have failing CI, but the above crash was reproduced locally by running mamba create -n test-crash libgdal and then trying to open a zipped shapefile with ogrinfo path/to/file

@jorisvandenbossche
Copy link
Member Author

cc @conda-forge/gdal this breaks libgdal.

Does anyone know if patching tiledb to add a azure-core-cpp<1.11 pin (https://github.com/conda-forge/conda-forge-repodata-patches-feedstock) is the best way forward to resolve this short term? (or mark that latest release of azure-core-cpp as broken?)

@ocefpaf
Copy link
Member

ocefpaf commented Jan 14, 2024

or mark that latest release of azure-core-cpp as broken?

I'm ok with this until we figure out a better pin for tiledb.

@jorisvandenbossche
Copy link
Member Author

Opened a PR for that at conda-forge/admin-requests#911

@ihnorton
Copy link
Contributor

Thanks, we will take a look ASAP. (cc @KiterLuc @teo-tsirpanis)

@jdblischak
Copy link
Member

Looking at the azure-core-cpp 1.11.0 changelog and comparing to the error message, I don't see anything obvious that should have broken libtiledb. The only item I see that looks somewhat related is:

Added 'OPTIONS' HTTP method to Azure::Core::Http::HttpMethod enum.

since the traceback includes various calls to Azure::Core::Http

@AchimGaedkeLynker
Copy link

Stumbled over this problem yesterday... Thanks for being on top of this that fast from the packaging side.

Anyone reaching out to https://github.com/TileDB-Inc/TileDB for advice?
I couldn't spot anything obvious in their logs/config - but there are some nightly tests failing.

@Shelnutt2
Copy link
Contributor

Anyone reaching out to https://github.com/TileDB-Inc/TileDB for advice?
I couldn't spot anything obvious in their logs/config - but there are some nightly tests failing.

Hello @AchimGaedkeLynker , we over at TileDB are aware. @ihnorton and @jdblischak are working on this from ourside. The azure-core-cpp breakage was completely unexpected and not inline with the Azure team's own commitments or their release notes. In additional to fixing the packaging here, we are getting clarification from the azure-core-cpp team about why this release of azure-core-cpp caused issues.

@AchimGaedkeLynker
Copy link

@Shelnutt2 - thanks a lot - yeah, yesterday I had trouble to pin-point the failure due to similar reasons/considerations till I actually pulled gdb out of the closet to get the stack trace.

All good on my side. I can confirm that avoiding azure-core-cpp 1.11.0 "solves the problem".

@jdblischak
Copy link
Member

I've submitted the repodata patch in conda-forge/conda-forge-repodata-patches-feedstock#639. Please review to expedite the process

@jdblischak
Copy link
Member

Is the problem with azure-core-cpp 1.11.0, tiledb, and gdal specific to linux architectures? Only the linux-* binaries were marked as broken in conda-forge/admin-requests#911. Thus the non-linux binaries of the recently built 2.18.4 are linked against azure-core-cpp 1.11.0, which is complicating my repodata patch in conda-forge/conda-forge-repodata-patches-feedstock#639

I see two potential solutions:

  1. If azure-core-cpp 1.11.0 is only a problem on linux, then I will update my repodata patch to only patch linux binaries
  2. If azure-core-cpp 1.11.0 is a problem on all platforms, then I think we need to mark the non-linux tiledb 2.18.4 binaries as broken and rebuild them with the recipe pin from Require azure-core-cpp <1.11 #229 applied

Another option which I don't recommend would be to patch the repodata to install azure-core-cpp 1.10 when they were built against 1.11, but that seems dangerous to me.

@jdblischak
Copy link
Member

@jorisvandenbossche you only marked the linux conda binaries as broken. Do you have evidence that geopandas/libgdal/tiledb are able to use azure-core-cpp 1.11 on osx and windows? I see that gdal-feedstock builds libgdal for osx-64, osx-arm64, and win-64. These currently install with azure-core-cpp 1.11.0, eg

# win-64
mamba create --dry-run -n test --override-channels -c conda-forge tiledb libgdal geopandas
## + azure-core-cpp                         1.11.0  h249a519_0             conda-forge      482kB
## + tiledb                                 2.19.0  h8e52ccb_0             conda-forge        4MB
## + python                                 3.12.1  h2628c8c_1_cpython     conda-forge       16MB
## + libgdal                                 3.8.3  h576f4c1_0             conda-forge        9MB
## + gdal                                    3.8.3  py312h6c4443f_0        conda-forge        2MB
## + geopandas-base                         0.14.2  pyha770c72_0           conda-forge        1MB
## + geopandas                              0.14.2  pyhd8ed1ab_0           conda-forge        8kB

@jdblischak
Copy link
Member

We've addressed this Issue from the conda side:

xref: conda-forge/admin-requests#911 (comment)

Progress to enable libtiledb compatibility with azure-core-cpp 1.11+ will take place upstream in the source repo

@hoxbro
Copy link

hoxbro commented Jan 31, 2024

Hi, I just saw our CI fails because of the new patch of azure-core-cpp 1.11.0. A MRE:

import geodatasets as gds
import geopandas as gpd

file = gds.get_path('geoda airbnb')
gpd.read_file(file)

Will output a segmentation fault.

❯ mamba list | grep -e tiledb -e azure-core-cpp
azure-core-cpp            1.11.0               h91d86a7_1    conda-forge
tiledb                    2.19.1               h4386cac_0    conda-forge

@jdblischak
Copy link
Member

@hoxbro Thanks for reporting!

I see what happened. We made the max_pin more strict for azure-core-cpp 1.11, but this wasn't backported to 1.10. Thus tiledb 2.19.1 was built against 1.10.3 but allows installing with 1.11.0.

I'll submit a patch today to fix this. I apologize for the inconvenience

@jdblischak jdblischak reopened this Jan 31, 2024
@hoxbro
Copy link

hoxbro commented Jan 31, 2024

There is absolutely no need to apologize; thank you for the quick response.

@jdblischak
Copy link
Member

Also, the short-term solution to immediately fix the problem is to pin azure-core-cpp=1.10 in your conda environments. Once I get everything fixed, this will no longer be necessary

@jdblischak
Copy link
Member

Is the problem with azure-core-cpp 1.11.0, tiledb, and gdal specific to linux architectures?

Quick update. I now have evidence that this is a linux-only problem. In our nightly conda builds, the segmentation fault with azure-core-cpp 1.11.0 only occured on the linux-* architectures

TileDB-Inc/conda-forge-nightly-controller#44
https://dev.azure.com/TileDB-Inc/CI/_build/results?buildId=37659&view=results

@jorisvandenbossche
Copy link
Member Author

@jdblischak and sorry about not answering your question about that before. I did not have any evidence, apart from that our CI of geopandas (which covers all main platforms) was only failing for linux, so that was the reason I was assuming only linux was affected. That seems to be confirmed by your observation as well.

@teo-tsirpanis
Copy link
Member

Have we opened an issue to https://github.com/Azure/azure-sdk-for-cpp?

rouault added a commit to rouault/gdal that referenced this issue Jan 31, 2024
by not identifying /vsi file systems it doesn't handle

Relates to conda-forge/tiledb-feedstock#228
rouault added a commit to rouault/gdal that referenced this issue Jan 31, 2024
by not identifying /vsi file systems it doesn't handle

Relates to conda-forge/tiledb-feedstock#228
@rouault
Copy link

rouault commented Jan 31, 2024

Somewhat related: improvement in GDAL to not trigger TileDB when not needed: OSGeo/gdal#9170

@martinfleis
Copy link

@jdblischak It seems that conda-forge/azure-core-cpp-feedstock#12 did not solve the issue and caused new environments to segfault again - https://github.com/geopandas/geodatasets/actions/runs/7734545137/job/21088724150

@jdblischak
Copy link
Member

did not solve the issue and caused new environments to segfault again

That was only the first step. The second step is to send a repodata patch, which I will do this morning. For immediate green CI, please pin azure-core-cpp=1.10 in your conda env

@jdblischak
Copy link
Member

The second (and with any luck final) step is conda-forge/conda-forge-repodata-patches-feedstock#649

@jdblischak
Copy link
Member

Ok, the failures that started yesterday with the new tiledb 2.19.1 should be fixed now. I completed the following:

Thus the existing 2.19.1 binaries should now install azure-core-cpp 1.10.3 at runtime. And any future tiledb binaries should install the updated azure-core-cpp 1.10.3 at build time, which will export a runtime requirement of azure-core-cpp >=1.10.3,<1.11

@jdblischak
Copy link
Member

Have we opened an issue to https://github.com/Azure/azure-sdk-for-cpp?

@teo-tsirpanis Not to my knowledge. Here's what we know so far about the bug:

  • If libtiledb that is built against azure-core-cpp 1.10.3 is installed into an environment with azure-core-cpp 1.11.0, it will fail with a segmentation fault when called via GDAL
  • If libtiledb that is built against azure-core-cpp 1.10.3 is installed into an environment with azure-core-cpp 1.11.0, it will fail with a segmentation fault when trying link another library against it
  • Both of the above segmentation faults only happen on linux architectures

In additional to fixing the packaging here, we are getting clarification from the azure-core-cpp team about why this release of azure-core-cpp caused issues.

@Shelnutt2 where is this discussion taking place? Is there an open issue? Have we learned anything from their team about what could be causing the problem?

@jdblischak
Copy link
Member

And any future tiledb binaries should install the updated azure-core-cpp 1.10.3 at build time, which will export a runtime requirement of azure-core-cpp >=1.10.3,<1.11

Actually I can be more decisive about this. Not "should install" but "will install". Our nightly feedstock builds are passing again because the backported run exports fix ensures that azure-core-cpp 1.10.3 is installed at runtime (TileDB-Inc/conda-forge-nightly-controller#44 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants