You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have used anemoi-datasets to create a zarr dataset from ocean model data (Norkyst). The data contains nan values over land which we are unsure how to handle. One possibility is to use a mask to cut out the nan values. However, since the depth of the ocean varies, the regions that contain nan values are larger at depth than at the surface (see the example below).
Do you have suggestions to how we should handle these nan values? Are there already some functionality that we have not seen that we could use? Or do you have a suggestion on what the best implementation could be?
Example
In this example we only look at the temperature at two depths, 1m and 300m: temperature_1 and temperature_300. Lets first remove the nan values at 1m, for example like this:
import numpy as np
import anemoi.datasets as ad
z = ad.open_dataset('./data/norkyst_v3_2024_01_01-02.zarr')
indx_1 = z.name_to_index['temperature_1']
temp_1 = z[0,indx_1,0,:]
mask = np.isfinite(temp_1)
temp_nonan = temp_1[mask]
lat_nonan = z.latitudes[mask]
lon_nonan = z.longitudes[mask]
so that now we have reduced the size of the array by removing all data which is over land. At 300m depth there are less temperature values (e.g. more nan values to remove):
so that len(temp_nonan) > len(temp_nonan3) and lat_nonan !=lat_nonan3 since mask != mask3. By applying a separate mask at each depth layer (we have 16 of them) we end up having different sets of lat and lon arrays at each depth. Is that a good solution?
Version number
I am using the following versions of the anemoi packages (pip freeze)
The text was updated successfully, but these errors were encountered:
inakbk
changed the title
Is there an easy way to drop all nan values in a dataset?
How to handle nan values at different depth layers? (Ocean data)
Nov 7, 2024
Describe the bug
We have used anemoi-datasets to create a zarr dataset from ocean model data (Norkyst). The data contains nan values over land which we are unsure how to handle. One possibility is to use a mask to cut out the nan values. However, since the depth of the ocean varies, the regions that contain nan values are larger at depth than at the surface (see the example below).
Do you have suggestions to how we should handle these nan values? Are there already some functionality that we have not seen that we could use? Or do you have a suggestion on what the best implementation could be?
Example
In this example we only look at the temperature at two depths, 1m and 300m:
temperature_1
andtemperature_300
. Lets first remove the nan values at 1m, for example like this:so that now we have reduced the size of the array by removing all data which is over land. At 300m depth there are less temperature values (e.g. more nan values to remove):
so that
len(temp_nonan) > len(temp_nonan3)
andlat_nonan !=lat_nonan3
sincemask != mask3
. By applying a separate mask at each depth layer (we have 16 of them) we end up having different sets of lat and lon arrays at each depth. Is that a good solution?Version number
I am using the following versions of the anemoi packages (
pip freeze
)To Reproduce
Steps to reproduce the behavior:
norkyst-one.yaml
:anemoi-datasets create norkyst-one.yaml data/norkyst_v3_2024_01_01.zarr
URL to sample input data
https://thredds.met.no/thredds/catalog/fou-hi/norkystv3_his_files/2024/01/01/catalog.html
The text was updated successfully, but these errors were encountered: