Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent 1D interp on 1D and ND DataArrays with NaNs #9702

Open
5 tasks done
delgadom opened this issue Oct 31, 2024 · 5 comments · May be fixed by #9813
Open
5 tasks done

inconsistent 1D interp on 1D and ND DataArrays with NaNs #9702

delgadom opened this issue Oct 31, 2024 · 5 comments · May be fixed by #9813

Comments

@delgadom
Copy link
Contributor

What happened?

May be a duplicate of #5852, and as stated there, interp with nans isn't really supported. But I noticed that in some cases xr.DataArray.interp drops valid data if there are NaNs in the array, and that this behavior is different depending on the number of dimensions, even if only a single dimension is specified (so interp1d should be used in every case).

Not sure if this warrants a different issue, but the NaN handling appears to be different depending on the number of dimensions, even if we're interpolating along a single dimension (so should be essentially broadcasting interp1d:

Here's a MRE:

import xarray as xr
import numpy as np

a = xr.DataArray(
    [np.nan, 1, 2],
    dims=["x"],
    coords=[range(0, 5, 2)],
)

print("a:\n")
print(a)

b = xr.DataArray(
    [[np.nan, 1, 2]],
    dims=["y", "x"],
    coords=[["first"], range(0, 5, 2)],
)

print("\n\nb:\n")
print(b)

print("\n\na.interp(x=range(5)):\n")
print(a.interp(x=range(5)))

print("\n\nb.interp(x=range(5)):\n")
print(b.interp(x=range(5)))

print("\n\nxarray version:")
print(xr.__version__)

gives

a:

<xarray.DataArray (x: 3)> Size: 24B
array([nan,  1.,  2.])
Coordinates:
  * x        (x) int64 24B 0 2 4


b:

<xarray.DataArray (y: 1, x: 3)> Size: 24B
array([[nan,  1.,  2.]])
Coordinates:
  * y        (y) <U5 20B 'first'
  * x        (x) int64 24B 0 2 4


a.interp(x=range(5)):

<xarray.DataArray (x: 5)> Size: 40B
array([nan, nan, 1. , 1.5, 2. ])
Coordinates:
  * x        (x) int64 40B 0 1 2 3 4


b.interp(x=range(5)):

<xarray.DataArray (y: 1, x: 5)> Size: 40B
array([[nan, nan, nan, 1.5, 2. ]])
Coordinates:
  * y        (y) <U5 20B 'first'
  * x        (x) int64 40B 0 1 2 3 4


xarray version:
2024.10.0

Note that the second interpolation case, in which the array is 2D, drops a valid value (1, in position x=2), even though it is present in the original array. The first case correctly interpolates between 1 and 2 and does not drop the first value.

Apologies - I don't have time now to diagnose whether this is an xarray issue with the new 2D interp handling or simply a scipy interp1d broadcasting issue, but it was certainly unexpected!

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:41) [Clang 15.0.7 ]
python-bits: 64
OS: Darwin
OS-release: 24.0.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.0
libnetcdf: 4.9.2

xarray: 2024.10.0
pandas: 2.2.3
numpy: 1.24.3
scipy: 1.14.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.8.0
zarr: 2.18.3
cftime: 1.6.2
nc_time_axis: None
iris: None
bottleneck: 1.3.7
dask: 2024.8.0
distributed: 2024.8.0
matplotlib: 3.7.1
cartopy: 0.24.0
seaborn: 0.12.2
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: 9.11
numpy_groupies: 0.11.2
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None

@delgadom delgadom added bug needs triage Issue that has not been reviewed by xarray team member labels Oct 31, 2024
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Nov 6, 2024
@dcherian
Copy link
Contributor

dcherian commented Nov 6, 2024

@hollymandel do you have time to take a quick look here? Simply diagnosing the core problem would be quite helpful

@hollymandel
Copy link
Contributor

hollymandel commented Nov 9, 2024

This behavior seems to come from scipy:

import scipy

print(scipy.interpolate.interp1d(x = [0,2,4], y =[np.nan, 1,2])([0,1,2,3,4])) # [nan nan 1.  1.5 2. ]
print(scipy.interpolate.interp1d(x = [0,2,4], y =[[np.nan, 1,2]])([0,1,2,3,4])) # [[nan nan nan 1.5 2. ]]

@hollymandel
Copy link
Contributor

I think adding a warning (as suggested in #5852) is reasonable, can do the PR if others agree.

@dcherian
Copy link
Contributor

dcherian commented Nov 9, 2024

Nice, I think a warning that suggests .dropna would be good.

@hollymandel hollymandel linked a pull request Nov 24, 2024 that will close this issue
3 tasks
@hollymandel
Copy link
Contributor

hollymandel commented Nov 24, 2024

Here is a draft PR: #9813

I guess the maintainers should decide whether this is too noisy? I'm not really sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants