-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve WWatch3 wind and currents forcing file generation #271
Conversation
Hoping to improve the reliability of the make_ww3_current_file and make_ww3_wind_file workers by changing them to use h5netcdf for dataset reads. The change results in version updates for multiple dependencies in requirements .txt, and inclusion of the h5netcdf package across multiple environment files and the pyproject.toml file.
Removed not informative "Set up and run the worker." line at the beginning. re: issue #121
Refactored the WWatch3 wind file generation to remove unnecessary variables and improved related tests. The code now drops more unneeded variables to reduce the memory load. The corresponding tests are also enhanced to accurately represent these changes.
The h5netcdf engine has been set as the engine for opening datasets in 'make_ww3_wind_file.py'. The intent is to avoid the netcdf4 package thread-safety issues. The corresponding test cases in 'test_make_ww3_wind_file.py' have also been updated to reflect this change.
The chunk size for the time_counter variable in the make_ww3_wind_file worker and corresponding tests has been increased. This change is anticipated to improve efficiency by processing data in larger batches.
The wind file creation process for the WWatch3 model has been updated to use processes rather than the default threads for the dask scheduler. Processes have been found to be more reliable for dask operations on the types of workloads we use in SalishSeaCast.
Explicitly use netcdf4 as the engine for dataset writing. This avoids incompatibilities in the resulting file that arise if it is written using h5netcdf.
Refactored the WWatch3 current file generation to remove unnecessary variables and improved related tests. The code now drops more unneeded variables to reduce the memory load. The corresponding tests are also enhanced to accurately represent these changes.
The h5netcdf engine has been set as the engine for opening datasets in 'make_ww3_current_file.py'. The intent is to avoid the netcdf4 package thread-safety issues. The corresponding test cases in 'test_make_ww3_current_file.py' have also been updated to reflect this change.
The chunk size for the time_counter variable in the make_current_wind_file worker has decreased from 3 to 1. Testing showed that the smaller chunk size resulted in slightly faster processing.
The currents file creation process for the WWatch3 model has been updated to use processes rather than the default threads for the dask scheduler. Processes have been found to be more reliable for dask operations on the types of workloads we use in SalishSeaCast.
Explicitly use netcdf4 as the engine for dataset writing. This avoids incompatibilities in the resulting file that arise if it is written using h5netcdf.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #271 +/- ##
==========================================
+ Coverage 77.63% 77.64% +0.01%
==========================================
Files 133 133
Lines 18686 18695 +9
Branches 1910 1910
==========================================
+ Hits 14506 14515 +9
Misses 4113 4113
Partials 67 67
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Successfully tested in production on |
These changes were motivated by the intermittent stalls of the
make_ww3_current_file
andmake_ww3_wind_file
workers that have plagued operations for the past year. The changes reflect the many things that have been learned about usingxarray.opern_dataset()
,xarray.open_mfdataset()
, anddask
since the workers were written.h5netcdf
package for dataset reads in the hop of improving the reliability of themake_ww3_current_file
andmake_ww3_wind_file
workersdask
processing and explicitly call the.compute()
before writing the datasets to force read processing withh5netcdf
in the process-baseddask
schedulernetcdf4
as the engine for dataset writing. This avoids incompatibilities in the resulting file that arise if it is written usingh5netcdf
main()
function docstrings re: issue Update worker main() function docstrings #121