Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle nan values at different depth layers? (Ocean data) #114

Open
inakbk opened this issue Nov 7, 2024 · 0 comments
Open

How to handle nan values at different depth layers? (Ocean data) #114

inakbk opened this issue Nov 7, 2024 · 0 comments

Comments

@inakbk
Copy link

inakbk commented Nov 7, 2024

Describe the bug
We have used anemoi-datasets to create a zarr dataset from ocean model data (Norkyst). The data contains nan values over land which we are unsure how to handle. One possibility is to use a mask to cut out the nan values. However, since the depth of the ocean varies, the regions that contain nan values are larger at depth than at the surface (see the example below).

Do you have suggestions to how we should handle these nan values? Are there already some functionality that we have not seen that we could use? Or do you have a suggestion on what the best implementation could be?

Example
In this example we only look at the temperature at two depths, 1m and 300m: temperature_1 and temperature_300. Lets first remove the nan values at 1m, for example like this:

import numpy as np
import anemoi.datasets as ad

z = ad.open_dataset('./data/norkyst_v3_2024_01_01-02.zarr')

indx_1 = z.name_to_index['temperature_1']
temp_1 = z[0,indx_1,0,:]

mask = np.isfinite(temp_1)

temp_nonan = temp_1[mask]
lat_nonan = z.latitudes[mask]
lon_nonan = z.longitudes[mask]

so that now we have reduced the size of the array by removing all data which is over land. At 300m depth there are less temperature values (e.g. more nan values to remove):

indx_3 = z.name_to_index['temperature_300']
temp_3 = z[0,indx_3,0,:]
mask3 = np.isfinite(temp_3)

temp_nonan3 = temp_1[mask3]
lat_nonan3 = z.latitudes[mask3]
lon_nonan3 = z.longitudes[mask3]

so that len(temp_nonan) > len(temp_nonan3) and lat_nonan !=lat_nonan3 since mask != mask3. By applying a separate mask at each depth layer (we have 16 of them) we end up having different sets of lat and lon arrays at each depth. Is that a good solution?

Version number

I am using the following versions of the anemoi packages (pip freeze)

anemoi-datasets==0.5.7
anemoi-graphs==0.3.0
anemoi-models==0.3.0
anemoi-training==0.2.0
anemoi-utils==0.4.0

To Reproduce
Steps to reproduce the behavior:

  1. run the norkyst-one.yaml:
dates:
  start: 2024-01-01T00:00:00Z
  end: 2024-01-01T23:30:00Z
  frequency: 1h
resolution: o96 
input:
    join:
      - netcdf: 
          path: /lustre/storeB/project/fou/hi/oper/norkyst_v3/forecast/his_zdepths/2024/01/01/norkyst800_his_zdepth_*_m00_AN.nc
          param: [temperature] #, salinity, u_eastward, v_northward]
statistics:
  allow_nans: [temperature]
  1. Run anemoi-datasets create norkyst-one.yaml data/norkyst_v3_2024_01_01.zarr
  2. There is no error, just a zarr dataset containing lots of nans

URL to sample input data
https://thredds.met.no/thredds/catalog/fou-hi/norkystv3_his_files/2024/01/01/catalog.html

@inakbk inakbk changed the title Is there an easy way to drop all nan values in a dataset? How to handle nan values at different depth layers? (Ocean data) Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant