Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Nightly upstream-dev CI failed ⚠️ #87

Open
github-actions bot opened this issue Oct 31, 2024 · 6 comments
Open

⚠️ Nightly upstream-dev CI failed ⚠️ #87

github-actions bot opened this issue Oct 31, 2024 · 6 comments
Labels

Comments

@github-actions
Copy link

Workflow Run URL

Python 3.11 Test Summary
xvec/tests/test_zonal_stats.py::test_structure[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_match: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_dataset[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_dataarray[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_stat[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_all_touched[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_n_jobs: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_callable[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
xvec/tests/test_zonal_stats.py::test_multiple[iterate]: TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fc9236be390>' as a data type
@github-actions github-actions bot added the CI label Oct 31, 2024
@scottyhq
Copy link

This seems to be an error coming from some change to Xarray between 2024.09 -> 2024.10, I had a quick look for changes to indexing.py in xarray but it's not obvious to me where this is coming from. Full traceback below cc @benbovy @keewis :

xvec/tests/test_zonal_stats.py:318: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
xvec/accessor.py:1102: in zonal_stats
    result = _zonal_stats_iterative(
xvec/zonal.py:194: in _zonal_stats_iterative
    vec_cube = xr.concat(
../../miniforge3/envs/xvec-dev/lib/python3.12/site-packages/xarray/core/concat.py:277: in concat
    return _dataset_concat(
../../miniforge3/envs/xvec-dev/lib/python3.12/site-packages/xarray/core/concat.py:703: in _dataset_concat
    index_vars = index.create_variables({dim_name: dim_var})
../../miniforge3/envs/xvec-dev/lib/python3.12/site-packages/xarray/core/indexes.py:720: in create_variables
    data = PandasIndexingAdapter(self.index, dtype=self.coord_dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[AttributeError("'PandasIndexingAdapter' object has no attribute '_dtype'") raised in repr()] PandasIndexingAdapter object at 0x12570b6a0>
array = Index([                                                                                                               ...-59.14999999999992 -51.49999999999997, -58.55000000000007 -51.100000000000016)],
      dtype='object', name='geometry')
dtype = <geopandas.array.GeometryDtype object at 0x1254a1fa0>

    def __init__(self, array: pd.Index, dtype: DTypeLike = None):
        from xarray.core.indexes import safe_cast_to_index
    
        self.array = safe_cast_to_index(array)
    
        if dtype is None:
            self._dtype = get_valid_numpy_dtype(array)
        else:
>           self._dtype = np.dtype(dtype)
E           TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x1254a1fa0>' as a data type

../../miniforge3/envs/xvec-dev/lib/python3.12/site-packages/xarray/core/indexing.py:1674: TypeError

@benbovy
Copy link
Member

benbovy commented Nov 26, 2024

Thanks @scottyhq.

I further investigated the issue and it may be related to this change in Xarray: pydata/xarray#9520

Before that change geopandas.array.GeometryDtype was converted to np.dtype(object) but now it is propagated to the xarray.Variable.

Interestingly, np.dtype() accepts pandas extension types only as classes (not as objects) while it accepts numpy dtypes either as classes or as objects.

>>> import numpy as np
>>> import geopandas
>>> np.dtype(geopandas.array.GeometryDtype)
dtype('O')
>>> np.dtype(geopandas.array.GeometryDtype())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x140b2f9d0>' as a data type
>>> np.dtype(type(np.dtype("O")))
dtype('O')
>>> np.dtype(np.dtype("O"))
dtype('O')

An easy workaround is to force converting to np.dtype("O") before creating the PandasIndex (from GeometryIndex).

A proper fix would be to propagate pandas extension types in xarray.core.indexing.PandasIndexingAdapter as well.

@coolzhao
Copy link

coolzhao commented Nov 28, 2024

Hi all, I ran into the same error when using the iterative method to do zonal statistics. After some exploration, I discovered the problem is due to xr.concat codes in the _zonal_stats_iterative function.

xvec/xvec/zonal.py

Lines 194 to 197 in b3bef06

vec_cube = xr.concat(
zonal, # type: ignore
dim=xr.DataArray(geometry, name=name, dims=name),
).xvec.set_geom_indexes(name, crs=crs)

Here, dim is input as a DataArray; ideally, its name is used as the dimension to concatenate along, and the values are added as a coordinate. But the xr.concat function will try to create index_vars with the input data type of dim (GeometryDtype), which is not supported in PandasIndexingAdapter; that's when the error occurs.
https://github.com/pydata/xarray/blob/7fd572d374df45b863c54e380323d898d060db5a/xarray/core/concat.py#L701-L703

My current solution is to use a plain Index when concatenating, assign coordinates, and rename it afterward. Codes below.

vec_cube = xr.concat(
    zonal,  # type: ignore
    dim=pd.Index(range(len(geometry)), name="geom_index"),
).assign_coords(geom_index=geometry).rename(geom_index=name).xvec.set_geom_indexes(name, crs=crs)

Happy to do a PR if needed. Cheers.

@martinfleis
Copy link
Member

I honestly think this is a regression in xarray and should probably be fixed there. It did work and now, after some change, does not. We could come up with a workaround but if it can be solved there, I think it would be preferable.

@benbovy any idea how feasible is to fix it in xarray? If it is a pain we can do the patch @coolzhao suggested.

@benbovy
Copy link
Member

benbovy commented Nov 29, 2024

The fix in xarray may be pretty straightforward, although I'm not super familiar with pandas extension types and if/how supporting it in xarray.core.indexing.PandasIndexingAdapter will introduce other regressions, etc. in Xarray.

In the meantime, if the issue is critical here, a quick and dirty fix would be in xvec to force the coord_dtype=np.dtype(object) attribute of the PandasIndex (encapsulated by GeometryIndex) just after creating it. Although this might possibly have an impact on Xarray / GeoPandas roundtrips (I don't know).

@benbovy
Copy link
Member

benbovy commented Nov 29, 2024

Interestingly, np.dtype() accepts pandas extension types only as classes (not as objects) while it accepts numpy dtypes either as classes or as objects.

It is also possible that the ultimate fix would be in numpy :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants