Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline diagnostics fail KeyError: 'pressure_thickness_of_atmospheric_layer' #2136

Open
oliverwm1 opened this issue Jan 10, 2023 · 1 comment

Comments

@oliverwm1
Copy link
Contributor

Trying to generate an offline report for a radiative flux model. The test data (gs://vcm-ml-intermediate/2023-01-09/prescribed-radiative-fluxes-for-training-rad-flux-model.zarr) does not include pressure_thickness_of_atmospheric_layer (since not necessary for inputs or outputs) but this seems to cause a KeyError when trying to compute offline diagnostics. Traceback indicates that it is related to vcm.DerivedMapping.

Don't have minimal reproducer, but can try to make one if it's helpful.

Traceback:

+ python -m fv3net.diagnostics.offline.compute gs://vcm-ml-experiments/default/2023-01-10/rad-flux-fine-only-ml-trial-0/trained_models/radiative_fluxes test_data.yaml gs://vcm-ml-experiments/default/2023-01-10/rad-flux-fine-only-ml-trial-0/offline_diags/radiative_fluxes
2023-01-10 22:23:17.975636: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-01-10 22:23:17.975692: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
offline_diags 2023-01-10 22:23:35,199: compute/L296 Starting diagnostics routine.
offline_diags 2023-01-10 22:23:39,092: compute/L309 Opening ML model
2023-01-10 22:23:39.379723: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-01-10 22:23:39.379874: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-10 22:23:39.379933: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (test-train-eval-prog-18217ca22fb5-3012164664): /proc/driver/nvidia/version does not exist
2023-01-10 22:23:39.380322: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
tensorflow 2023-01-10 22:23:40,903: load/L167 No training configuration found in save file, so the model was *not* compiled. Compile it manually.
KeyError: 'pressure_thickness_of_atmospheric_layer'
    variable = self._variables[name]
  File "/opt/conda/envs/fv3net/lib/python3.8/site-packages/xarray/core/dataset.py", line 1398, in _construct_dataarray
Traceback (most recent call last):
During handling of the above exception, another exception occurred:
KeyError: 'pressure_thickness_of_atmospheric_layer'
    ref_var = variables[ref_name]
  File "/opt/conda/envs/fv3net/lib/python3.8/site-packages/xarray/core/dataset.py", line 173, in _get_virtual_variable
    _, name, variable = _get_virtual_variable(
  File "/opt/conda/envs/fv3net/lib/python3.8/site-packages/xarray/core/dataset.py", line 1400, in _construct_dataarray
    return self._construct_dataarray(key)
  File "/opt/conda/envs/fv3net/lib/python3.8/site-packages/xarray/core/dataset.py", line 1502, in __getitem__
    return self._mapper[key]
  File "/home/jovyan/fv3net/external/vcm/vcm/derived_mapping.py", line 66, in __getitem__
    return {key: self[key] for key in keys}
  File "/home/jovyan/fv3net/external/vcm/vcm/derived_mapping.py", line 78, in <dictcomp>
    return {key: self[key] for key in keys}
  File "/home/jovyan/fv3net/external/vcm/vcm/derived_mapping.py", line 78, in _data_arrays
    return xr.Dataset(self._data_arrays(keys))
  File "/home/jovyan/fv3net/external/vcm/vcm/derived_mapping.py", line 81, in dataset
    return derived_mapping.dataset(variables)
  File "/home/jovyan/fv3net/external/loaders/loaders/_utils.py", line 106, in add_derived_data
    return self._partial(*args, **kwargs)
  File "/opt/conda/envs/fv3net/lib/python3.8/site-packages/toolz/functoolz.py", line 303, in __call__
    ret = f(ret)
  File "/opt/conda/envs/fv3net/lib/python3.8/site-packages/toolz/functoolz.py", line 488, in __call__
    return self._func(self._args[item])
  File "/home/jovyan/fv3net/external/loaders/loaders/batches/_sequences.py", line 143, in __getitem__
    return self._func(self._args[item])
  File "/home/jovyan/fv3net/external/loaders/loaders/batches/_sequences.py", line 143, in __getitem__
    v = self[i]
  File "/opt/conda/envs/fv3net/lib/python3.8/_collections_abc.py", line 874, in __iter__
    for i, batch in enumerate(batches):
  File "/home/jovyan/fv3net/workflows/diagnostics/fv3net/diagnostics/offline/compute.py", line 288, in _daskify_sequence
    concatted_batches = _daskify_sequence(batches)
  File "/home/jovyan/fv3net/workflows/diagnostics/fv3net/diagnostics/offline/compute.py", line 281, in get_prediction
    ds_predicted = get_prediction(
  File "/home/jovyan/fv3net/workflows/diagnostics/fv3net/diagnostics/offline/compute.py", line 316, in main
    main(args)
  File "/home/jovyan/fv3net/workflows/diagnostics/fv3net/diagnostics/offline/compute.py", line 402, in <module>
    exec(code, run_globals)
  File "/opt/conda/envs/fv3net/lib/python3.8/runpy.py", line 87, in _run_code
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/fv3net/lib/python3.8/runpy.py", line 194, in _run_module_as_main
Traceback (most recent call last):
@oliverwm1
Copy link
Contributor Author

test_data.yaml contents:

mapper_config:
  function: open_zarr
  kwargs:
    data_path: gs://vcm-ml-intermediate/2023-01-09/prescribed-radiative-fluxes-for-training-rad-flux-model.zarr
timesteps_per_batch: 10
timesteps:
- '20160805.170000'
- '20160806.010000'
- '20160806.070000'
- '20160806.190000'
- '20160807.220000'
- '20160809.100000'
- '20160810.120000'
- '20160811.050000'
- '20160811.180000'
- '20160811.210000'
- '20160813.060000'
- '20160813.170000'
- '20160814.040000'
- '20160814.210000'
- '20160816.080000'
- '20160817.170000'
- '20160817.210000'
- '20160818.050000'
- '20160818.100000'
- '20160819.220000'
- '20160820.040000'
- '20160820.150000'
- '20160821.050000'
- '20160821.120000'
- '20160822.000000'
- '20160823.090000'
- '20160824.200000'
- '20160826.040000'
- '20160826.140000'
- '20160826.230000'
- '20160828.030000'
- '20160829.010000'
- '20160829.220000'
- '20160830.120000'
- '20160830.220000'
- '20160831.050000'
- '20160831.200000'
- '20160901.190000'
- '20160902.000000'
- '20160902.050000'
- '20160902.190000'
- '20160903.220000'
- '20160904.060000'
- '20160904.130000'
- '20160904.190000'
- '20160905.120000'
- '20160905.150000'
- '20160906.180000'
- '20160908.040000'
- '20160908.210000'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant