Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcoded File Path Causing Test Failure #860

Open
amberchen122 opened this issue Jul 22, 2024 · 6 comments · May be fixed by #785
Open

Hardcoded File Path Causing Test Failure #860

amberchen122 opened this issue Jul 22, 2024 · 6 comments · May be fixed by #785

Comments

@amberchen122
Copy link
Collaborator

Description
The hard coded file path in test/test_api.py::TestAPI::test_open_mf_dataset is unsafe. I added a test file to test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc from Paul, and it failed.

Details

The test test/test_api.py::TestAPI::test_open_mf_dataset is failing after I added test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc.

I think the reason is because of this hardcoded file path incorrectly matching outCSne30_test2.nc.

Error Message

FAILED test/test_api.py::TestAPI::test_open_mf_dataset - ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation
@github-project-automation github-project-automation bot moved this to 📚 Backlog in UXarray Development Jul 22, 2024
@amberchen122 amberchen122 linked a pull request Jul 22, 2024 that will close this issue
14 tasks
@philipc2
Copy link
Member

Yeah this should really be a list of paths since that file path will assume any paths that match the outCSne30_ prefix with a .nc ending. We should have it point to the two data files as seen below.

image

dsfiles_mf_ne30  = [current_path / "meshfiles" / "ugrid" / "outCSne30" / "outCSne30_var2.nc",
                    current_path / "meshfiles" / "ugrid" / "outCSne30" / "outCSne30_vortex.nc"]

@erogluorhan
Copy link
Member

erogluorhan commented Jul 23, 2024

Even though explicitly listing the file names to be opened with open_mfdatasets can be an alternative solution, I don't think giving file name patterns as in here is the actual reason of the problem.

That is common practice to have multiple data files from the same workflow in the same folder, e.g. separate files for different time stamps in a time-series analysis or separate files for different variables in a big simulation. That said, if we think all of the files under the test/meshfiles/ugrid/outCSne30/ directory are somehow related to each other in such a way, let us keep storing them there; otherwise, we need to consider oving some of them into a different folder. If the former, then I'd expect a code something like this should still work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

@amberchen122
Copy link
Collaborator Author

Thank you for your input! I will move the test files needed for zonal-mean testing (test/meshfiles/ugrid/outCSne30_zonal_test/outCSne30_test2.nc and test/meshfiles/ugrid/outCSne30_zonal_test/outCSne30_test2.nc) to a new folder: test/meshfiles/ugrid/outCSne30_zonal_test.

Does this sound like a good solution for the zonal-mean PR failing the API test?

@erogluorhan
Copy link
Member

Before that, please determine if those newer zonal test files are very close in format/content with the original outCSne30_var2.nc and outCSne30_vortex.nc files. If that's the case, I'd instead suggest that you do the following code change in the failing tests, and it should work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

If that's not the case though, then your suggestion looks good.

@amberchen122
Copy link
Collaborator Author

amberchen122 commented Jul 30, 2024

Before that, please determine if those newer zonal test files are very close in format/content with the original outCSne30_var2.nc and outCSne30_vortex.nc files. If that's the case, I'd instead suggest that you do the following code change in the failing tests, and it should work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

If that's not the case though, then your suggestion looks good.

Thank you for your suggestion. I believe the zonal test files are in close format with the original test files. I have updated the test_open_mf_dataset as you recommended.

The new uxds_mf_ne30.uxgrid._ds.data_vars has one more dimension edge_node_connectivity:

test/test_api.py uxds_mf_ne30.uxgrid._ds.data_vars:
grid_topology int32 4B -2147483647
face_node_connectivity (n_face, n_max_face_nodes) int64 173kB 0 8 ... 6 298
node_lon (n_node) float64 43kB -45.0 45.0 ... 138.0 135.0
node_lat (n_node) float64 43kB ...
edge_node_connectivity (n_edge, two) int64 173kB 0 8 0 ... 5400 5400 5401

Therefore, I also modified the test constants in test/constants.py. Could you please verify if the new test parameters look good? The modifications are in the zonal-mean branch.

If this solution to the test_api.py bug is acceptable, then PR#785 is also ready for review. :)

@erogluorhan
Copy link
Member

Yes, this looks like a good solution to me; thanks a lot! Hey @philipc2 , do you agree with me that this will be a better solution forward?

@hongyuchen1030 hongyuchen1030 linked a pull request Jul 31, 2024 that will close this issue
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📚 Backlog
Development

Successfully merging a pull request may close this issue.

3 participants