Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTI stac-geoparquet driver support for multiple band assets #11321

Open
rbavery opened this issue Nov 21, 2024 · 3 comments
Open

GTI stac-geoparquet driver support for multiple band assets #11321

rbavery opened this issue Nov 21, 2024 · 3 comments

Comments

@rbavery
Copy link

rbavery commented Nov 21, 2024

Feature description

I'd like to be able to load STAC items that have bands referenced as distinct geotiffs. The current GTI docs indicate it is limited to using a location field that points to one href per STAC Item.

An example I am working with is here: stac-utils/stac-rs#528 (comment)

install

conda create -n test python=3.11 rasterio==1.4.2 gdal==3.10 libgdal==3.10 rioxarray libgdal-arrow-parquet 
conda activate test
# you may have to run `rustup default 1.80` to pip install stacrs
pip install stacrs

setting up the query

from datetime import datetime, timedelta
import stacrs
def convert_to_iso_datetime_range(simple_date_range):
    start_date_str, end_date_str = simple_date_range.split('/')
    start_date = datetime.strptime(start_date_str, "%Y-%m-%d")
    end_date = datetime.strptime(end_date_str, "%Y-%m-%d")
    start_date_iso = start_date.strftime("%Y-%m-%dT%H:%M:%S.%fZ")[:-3] + "Z"
    end_date_iso = (end_date - timedelta(seconds=1)).strftime("%Y-%m-%dT%H:%M:%S.%fZ")[:-3] + "Z"
    return f"{start_date_iso}/{end_date_iso}"

max_cloud_cover = 15
max_nodata_percent = 15
query_args = {
            "eo:cloud_cover": {"lt": max_cloud_cover},
            "s2:nodata_pixel_percentage": {"lt": max_nodata_percent},
        }
simple_date_range = "2023-01-01/2024-01-01"
iso_datetime_range = convert_to_iso_datetime_range(simple_date_range)
collection_id = "sentinel-2-c1-l2a"
catalog = "https://earth-search.aws.element84.com/v1"
lon, lat = -111.760826, 34.871002

# stack args
assets=["rededge1", "rededge2", "rededge3", "nir", "swir16", "swir22"]
rescale=False
xy_coords="center"
stack_id = "grid:code"
stack_dim = "time"

stacrs.search_to(
    "items.parquet",
    catalog,
    collections=collection_id,
    intersects={"type": "Point", "coordinates": [lon, lat]},
    sortby="-properties.datetime",
    max_items=20,
    datetime=iso_datetime_range,
    query=query_args
)

workaround to access the blue band and set it's href as the location field to confirm GTI works

import geopandas as gpd
df = gpd.read_parquet("items.parquet")
df['location'] = df.assets.apply(lambda x : x['blue']['href'])
df.to_parquet("items_with_location.parquet")
ds = xarray.open_dataset("GTI:/home/rave/dask_on_ray_poc/items_with_location.parquet", engine = "rasterio")
ds.values

Ideally the GTI driver would interpret the assets section of the stac-geoparquet and load all the bands, if the assets only referenced bands. But this isn't the case

df.assets[0].keys()
dict_keys(['swir16', 'tileinfo_metadata', 'scl', 'thumbnail', 'nir08', 'swir22', 'product_metadata', 'preview', 'nir09', 'nir', 'rededge2', 'rededge3', 'rededge1', 'aot', 'red', 'wvp', 'coastal', 'blue', 'cloud', 'visual', 'snow', 'granule_metadata', 'green'])

I'm wondering if GTI could look for a different column that points to a list of asset keys that refer to bands that also specifies band order for assembling a multi band mosaic. then it could handle fetching the hrefs from the stac-geoparquet assets section.

Additional context

This would be a nice quality of life feature for various users, top of mind for me is the TorchGeo project which is exploring how to easily make stac queries, persist them, and create mosaics of large raster datasets for model training.

I can help by finding someone to PR if I'm not able to do this work myself and if maintainers don't want to take this on but think it would still be valuable.

@rouault
Copy link
Member

rouault commented Nov 21, 2024

Attaching the related items.parquet (as it takes quite a bit to have the setup to generate it): items.parquet.zip

This ticket intersects the previous one of #11317 , so with #11319 , now it is possible to specify the LOCATION_FIELD open option to get one particular asset/band:

$ GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR gdalinfo GTI:items.parquet -oo LOCATION_FIELD=assets.rededge3.href
Driver: GTI/GDAL Raster Tile Index
Files: none associated
Size is 6018, 4962
Coordinate System is:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        MEMBER["World Geodetic System 1984 (G2296)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-112.099469246340178,35.243022132412960)
Pixel Size = (0.000200522510948,-0.000200522510948)
Corner Coordinates:
Upper Left  (-112.0994692,  35.2430221) (112d 5'58.09"W, 35d14'34.88"N)
Lower Left  (-112.0994692,  34.2480294) (112d 5'58.09"W, 34d14'52.91"N)
Upper Right (-110.8927248,  35.2430221) (110d53'33.81"W, 35d14'34.88"N)
Lower Right (-110.8927248,  34.2480294) (110d53'33.81"W, 34d14'52.91"N)
Center      (-111.4960970,  34.7455258) (111d29'45.95"W, 34d44'43.89"N)
Band 1 Block=256x256 Type=UInt16, ColorInterp=Gray
  NoData Value=0

So with that and classic VRT, an admitedly not so ergonomic solution would be to create a VRT per band, and assemble all those VRT, like:

GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR gdal_translate  GTI:items.parquet -oo LOCATION_FIELD=assets.rededge1.href rededge1.vrt
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR gdal_translate  GTI:items.parquet -oo LOCATION_FIELD=assets.rededge3.href rededge3.vrt
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR gdalbuildvrt -separate out.vrt rededge1.vrt rededge3.vrt

It would be indeed better to be able to specify maybe -oo LOCATION_FIELD=assets.*.href to ask to create automatically a multi-band GTI from the assets.

That said that would be tricky here because Sentinel 2 bands don't have all the same resolution, whereas in a GDAL dataset all bands must have the same resolution. So probably that internally the GTI driver would need to do the above workaround and align on the 10m resolution, or do like then Sentinel2 driver and expose one subdataset for the 10m bands, one for the 20 m ones and one for the 60m ones

rouault added a commit to rouault/gdal that referenced this issue Nov 21, 2024
…ts.XXX.proj:transform

fields in addition to top level proj:epsg and proj:transform.
Also handle geotransform being a IntegerList or Integer64List.

Refs OSGeo#11321 (but does no fix it)
@rbavery
Copy link
Author

rbavery commented Nov 21, 2024

So probably that internally the GTI driver would need to do the above workaround and align on the 10m resolution, or do like then Sentinel2 driver and expose one subdataset for the 10m bands, one for the 20 m ones and one for the 60m ones

Thanks for the work on this! The second option sounds interesting. I'd like it if there was some limited set of sensible choices for resolution for a given STAC collection that GTI could derive.

My initial expectation was that resolution would be a user specified arg but that's only because I've used the opendatacube odc.stac.load function before

https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html#odc.stac.load

for example

ds = odc.stac.load(
    stac_items_list,
    resolution=20
)

@mdsumner
Copy link
Contributor

odc exposes a limited set of target output grid params, and heuristics to pick a UTM zone if not specified - GTI's virtualness is offering a slot to add this explicitly, but uses similar heuristics to make a choice. It's a kind of slippery user/ux area where you want maximum flexibility (specifying one thing, or the entire grid details, or anything in between including no-user input).

There's all kinds of things these heuristics+rules need, like pixel alignment, and resolution-choice (good for when you know), dimension-choice (good for when you want a region for a quick look at n-pixels) - and there's no single framework to guide the user well here, and I don't this this is well understood (entirely off topic but just wanted to add a bit here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants