Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve WWatch3 wind and currents forcing file generation #271

Merged
merged 12 commits into from
Jun 13, 2024

Conversation

douglatornell
Copy link
Member

These changes were motivated by the intermittent stalls of the make_ww3_current_file and make_ww3_wind_file workers that have plagued operations for the past year. The changes reflect the many things that have been learned about using xarray.opern_dataset(), xarray.open_mfdataset(), and dask since the workers were written.

  • Change to h5netcdf package for dataset reads in the hop of improving the reliability of the make_ww3_current_file and make_ww3_wind_file workers
  • Remove more unnecessary variables for dataset reads to reduce the memory load
  • Adjust time_counter chunk sizes based on testing to improve performance
  • Change to use processes instead of the default threads for dask processing and explicitly call the .compute() before writing the datasets to force read processing with h5netcdf in the process-based dask scheduler
  • Explicitly use netcdf4 as the engine for dataset writing. This avoids incompatibilities in the resulting file that arise if it is written using h5netcdf
  • improve worker main() function docstrings re: issue Update worker main() function docstrings #121

Hoping to improve the reliability of the make_ww3_current_file and
make_ww3_wind_file workers by changing them to use h5netcdf for dataset reads.

The change results in version updates for multiple dependencies in
requirements .txt, and inclusion of the h5netcdf package across multiple
environment files and the pyproject.toml file.
Removed not informative "Set up and run the worker." line at the beginning.

re: issue #121
Refactored the WWatch3 wind file generation to remove unnecessary variables and
improved related tests. The code now drops more unneeded variables to reduce
the memory load. The corresponding tests are also enhanced to accurately
represent these changes.
The h5netcdf engine has been set as the engine for opening datasets in
'make_ww3_wind_file.py'. The intent is to avoid the netcdf4 package
thread-safety issues. The corresponding test cases in
'test_make_ww3_wind_file.py' have also been updated to reflect this change.
The chunk size for the time_counter variable in the make_ww3_wind_file worker
and corresponding tests has been increased. This change is anticipated to
improve efficiency by processing data in larger batches.
The wind file creation process for the WWatch3 model has been updated to use
processes rather than the default threads for the dask scheduler. Processes have
been found to be more reliable for dask operations on the types of workloads we
use in SalishSeaCast.
Explicitly use netcdf4 as the engine for dataset writing. This avoids
incompatibilities in the resulting file that arise if it is written using
h5netcdf.
Refactored the WWatch3 current file generation to remove unnecessary variables
and improved related tests. The code now drops more unneeded variables to reduce
the memory load. The corresponding tests are also enhanced to accurately
represent these changes.
The h5netcdf engine has been set as the engine for opening datasets in
'make_ww3_current_file.py'. The intent is to avoid the netcdf4 package
thread-safety issues. The corresponding test cases in
'test_make_ww3_current_file.py' have also been updated to reflect this change.
The chunk size for the time_counter variable in the make_current_wind_file
worker has decreased from 3 to 1. Testing showed that the smaller chunk size
resulted in slightly faster processing.
The currents file creation process for the WWatch3 model has been updated to use
processes rather than the default threads for the dask scheduler. Processes have
been found to be more reliable for dask operations on the types of workloads we
use in SalishSeaCast.
Explicitly use netcdf4 as the engine for dataset writing. This avoids
incompatibilities in the resulting file that arise if it is written using
h5netcdf.
@douglatornell douglatornell added enhancement New feature or request Workers labels Jun 7, 2024
@douglatornell douglatornell added this to the v24.1 milestone Jun 7, 2024
Copy link

codecov bot commented Jun 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.64%. Comparing base (1eca15e) to head (6cda311).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #271      +/-   ##
==========================================
+ Coverage   77.63%   77.64%   +0.01%     
==========================================
  Files         133      133              
  Lines       18686    18695       +9     
  Branches     1910     1910              
==========================================
+ Hits        14506    14515       +9     
  Misses       4113     4113              
  Partials       67       67              
Flag Coverage Δ
unittests 77.64% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@douglatornell
Copy link
Member Author

Successfully tested in production on arbutus since 1-Jun-2024. No worker stalls, but we sometimes went 30 days without a stall with the previous code. Merging because this PR doesn't break anything, and there is reason to hope that it improves the reliability of the workers.

@douglatornell douglatornell merged commit 4373f04 into main Jun 13, 2024
10 checks passed
@douglatornell douglatornell deleted the improve-ww3-prep branch June 13, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Workers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant