Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detailed documentation of the dataset configuration file #124

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

favyen2
Copy link
Collaborator

@favyen2 favyen2 commented Jan 15, 2025

The main goal of this PR is to add documentation about the dataset configuration file. This is in the new docs/DatasetConfig.md, along with one example that is started at docs/examples/WindowsFromGeojson.md.

However I ended up needing to make auxiliary changes:

  • We have never used max_time_delta so I removed the option. This adds temporal padding to the window's time range before searching for matching items in some data sources. I realized though that this is actually used for the BEFORE / AFTER time modes which we have never used. I updated the meaning of these modes to look for items within the window's time range, but in a different order (BEFORE will now look in reverse time order starting with the items closest to the window's end time, while AFTER will look in forward time order starting with the items closest to the window's start time).
  • Fixed states and years config options for Naip that were not exposed properly.
  • LocalFiles: I was trying to use LocalFiles data source with a global GeoJSON of marine infrastructure but ran into issues where the item was not matching to any geometries. This was because it covered such a big region that after re-projection to individual UTM zones, it would not actually span the UTM zone since the re-projection in rasterio wasn't designed to handle geometries that big. So now there is special code in rslearn.utils.geometry and rslearn.data_sources.utils to handle "global" geometries, and if the vector file is big enough it is treated as "global".
  • Relatedly, previously the vector data was not being cropped to match with windows, instead just including the whole thing. I added cropping in rslearn.tile_stores.default.DefaultTileStore.
  • And fix an issue with caching the data source per worker process where it wasn't actually being cached since LayerConfig didn't implement __eq__ and __hash__.

@favyen2 favyen2 marked this pull request as draft January 15, 2025 23:33
@favyen2 favyen2 marked this pull request as ready for review January 17, 2025 17:17
@favyen2 favyen2 requested a review from yawenzzzz January 17, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant