Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[13pt] Possible bug in Sierra test / rating_curve_comparion #1302

Open
RobHanna-NOAA opened this issue Sep 23, 2024 · 3 comments · May be fixed by #1388
Open

[13pt] Possible bug in Sierra test / rating_curve_comparion #1302

RobHanna-NOAA opened this issue Sep 23, 2024 · 3 comments · May be fixed by #1388
Assignees

Comments

@RobHanna-NOAA
Copy link
Contributor

RobHanna-NOAA commented Sep 23, 2024

A PR (1301 is pending merging for rating_curve_comparison which simply add a duration system.

During testing for that change, a possible bug was detected in the logic.

A large number of HUCs were processing well, but also a very large amount of both warnings and errors.

Below is a copy/paste from the log outputs:

doesn't seem like any 15050302 made it through
missing USGS rating curve data for usgs station 09484500 in huc 15050302
missing USGS rating curve data for usgs station 09484000 in huc 15050302
missing USGS rating curve data for usgs station 09485000 in huc 15050302
missing USGS rating curve data for usgs station 09484580 in huc 15050302


	  WARNING: missing USGS elevation data for usgs station 01487000 in huc 02080109
	  WARNING: missing USGS elevation data for usgs station 01487000 in huc 02080109
	  WARNING: nwm_recurr_data_table is missing location_id column for gage ['01488110', '01487000'] in huc 02080109
	  WARNING: rating curve dataframe not processed correctly...
	  Summary: [<FrameSummary file /foss_fim/tools/rating_curve_comparison.py, line 645 in generate_facet_plot>, <FrameSummary file /foss_fim/tools/rating_curve_comparison.py, line 381 in generate_rating_curve_metrics>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 48 in mapstar>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 125 in worker>, <FrameSummary file /usr/lib/python3.10/multiprocessing/process.py, line 108 in run>, <FrameSummary file /usr/lib/python3.10/multiprocessing/process.py, line 314 in _bootstrap>, <FrameSummary file /usr/lib/python3.10/multiprocessing/popen_fork.py, line 71 in _launch>, <FrameSummary file /usr/lib/python3.10/multiprocessing/popen_fork.py, line 19 in __init__>, <FrameSummary file /usr/lib/python3.10/multiprocessing/context.py, line 281 in _Popen>, <FrameSummary file /usr/lib/python3.10/multiprocessing/process.py, line 121 in start>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 329 in _repopulate_pool_static>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 306 in _repopulate_pool>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 215 in __init__>, <FrameSummary file /usr/lib/python3.10/multiprocessing/context.py, line 119 in Pool>, <FrameSummary file /foss_fim/tools/rating_curve_comparison.py, line 1370 in <module>>] 
	 Exception: 
	 AttributeError("'DataFrame' object has no attribute 'location_id'")
	  WARNING: rating curve dataframe not processed correctly...
	  Summary: [<FrameSummary file /foss_fim/tools/rating_curve_comparison.py, line 645 in generate_facet_plot>, <FrameSummary file /foss_fim/tools/rating_curve_comparison.py, line 381 in generate_rating_curve_metrics>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 48 in mapstar>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 125 in worker>, <FrameSummary file /usr/lib/python3.10/multiprocessing/process.py, line 108 in run>, <FrameSummary file /usr/lib/python3.10/multiprocessing/process.py, line 314 in _bootstrap>, <FrameSummary file /usr/lib/python3.10/multiprocessing/popen_fork.py, line 71 in _launch>, <FrameSummary file /usr/lib/python3.10/multiprocessing/popen_fork.py, line 19 in __init__>, <FrameSummary file /usr/lib/python3.10/multiprocessing/context.py, line 281 in _Popen>, <FrameSummary file /usr/lib/python3.10/multiprocessing/process.py, line 121 in start>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 329 in _repopulate_pool_static>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 306 in _repopulate_pool>, <FrameSummary file /usr/lib/python3.10/multiprocessing/pool.py, line 215 in __init__>, <FrameSummary file /usr/lib/python3.10/multiprocessing/context.py, line 119 in Pool>, <FrameSummary file /foss_fim/tools/rating_curve_comparison.py, line 1370 in <module>>] 

As mentioned, this might not be a problem. If not, maybe we can add some notes to the file to help manage expectations.

This review / possible change will be needed in the next four weeks appx, as we are preparing for another BED/production run.


Update: Nov 13, 2024

We need to create the output folder earlier in the stack:
Traceback (most recent call last):
File "//foss_fim/tools/rating_curve_comparison.py", line 1303, in
logging.FileHandler(os.path.join(output_dir, f'rating_curve_comparison_{log_dt_string}.log')),
File "/usr/lib/python3.10/logging/init.py", line 1169, in init
StreamHandler.init(self, self._open())
File "/usr/lib/python3.10/logging/init.py", line 1201, in _open
return open_func(self.baseFilename, self.mode,
FileNotFoundError: [Errno 2] No such file or directory: '/data/fim_performance/hand_4_5_11_1/rating_curve_comparison/rating_curve_comparison_2024_11_13-18_27_46.log'
Wed Nov 13 18:27:46 UTC 2024

@RobHanna-NOAA
Copy link
Contributor Author

RobHanna-NOAA commented Nov 13, 2024

Duration system added with 1301, but this other part could still be a bug. Also added a new bug at the bottom of the main body about no such dir when starting the log file.

@CarsonPruitt-NOAA CarsonPruitt-NOAA changed the title [8pt] Possible bug in Sierra test / rating_curve_comparion [13pt] Possible bug in Sierra test / rating_curve_comparion Dec 3, 2024
@ZahraGhahremani ZahraGhahremani linked a pull request Dec 31, 2024 that will close this issue
20 tasks
@ZahraGhahremani
Copy link
Contributor

ZahraGhahremani commented Jan 1, 2025

Here are some examples comparing the new results with the old results.
The vertical lines appear in the new plots. Some locations still do not have any vertical lines because they lack USGS rating curve data (missing USGS rating curve data for USGS station 09484500 in HUC 15050302). Therefore, this is not a problem.
However, the vertical lines were not plotted previously for some location_ids with more than one feature_id.

image

The script will always use the first feature_id in the new feature branch unless its flow is significantly lower than the USGS maximum flow (see figure below).

image

This likely occurs due to incorrect crosswalking. In the figure below, the red dot represents the USGS gauge. The blue lines indicate the flowline for branch nonzero (left figure), while the green lines represent the flowline for branch 0 (right figure). The associated flow for the red lines is significantly lower for branch 0, which should be part of the main channel with a higher flow.

image

@RobHanna-NOAA
Copy link
Contributor Author

Ok. cool. Well.. let's just make to catch and handle it gracefully. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants