Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position of TileOffsets and TileByteCounts #5

Open
tbonfort opened this issue Oct 16, 2019 · 9 comments
Open

Position of TileOffsets and TileByteCounts #5

tbonfort opened this issue Oct 16, 2019 · 9 comments

Comments

@tbonfort
Copy link

The spec requires that the TileOffsets and TileByteCounts entries that don't fit in the directory should immediately follow the directory. Given that reading those entries is deferred when opening the Directory, wouldn't it be more efficient if those overflow entries be put after all directories instead of after each directory?

i.e. current layout as defined in the spec is

  • TIFF / BigTIFF signature
  • IFD (Image File Directory of full resolution image
  • Values of TIFF tags that don't fit inline in the IFD directory, such as TileOffsets?, TileByteCounts? and GeoTIFF keys
  • Optional: IFD (Image File Directory) of first overview (typically subsampled by a factor of 2), followed by the values of its tags that don't fit inline
  • Optional: IFD (Image File Directory) of second overview (typically subsampled by a factor of 4), followed by the values of its tags that don't fit inline
  • ...
  • Optional: IFD (Image File Directory) of last overview (typically subsampled by a factor of 2N), followed by the values of its tags that don't fit inline
  • Optional: tile content of last overview level
  • ...
  • Optional: tile content of first overview level
  • Tile content of full resolution image.

Rearanging it to the following layout would allow the whole image structure to be read with a much smaller GET request. The TileOffsets and TileByteCounts can either be interleaved with the image data (as shown) or all be included between the last IFD and the first block of image contents:

  • TIFF / BigTIFF signature
  • IFD (Image File Directory of full resolution image
  • Values of TIFF tags that don't fit inline in the IFD directory, except for TileOffsets, TileByteCounts
  • Optional: IFD (Image File Directory) of first overview (typically subsampled by a factor of 2), followed by the values of its tags that don't fit inline (except for TileOffsets, TileByteCounts)
  • Optional: IFD (Image File Directory) of second overview (typically subsampled by a factor of 4), followed by the values of its tags that don't fit inline (except for TileOffsets, TileByteCounts)
  • ...
  • Optional: IFD (Image File Directory) of last overview (typically subsampled by a factor of 2N), followed by the values of its tags that don't fit inline (except for TileOffsets, TileByteCounts)
  • Optional: TileOffsets and TileByteCounts of last overview level
  • Optional: tile content of last overview level
  • ...
  • Optional: TileOffsets and TileByteCounts of first overview level
  • Optional: tile content of first overview level
  • TileOffsets and TileByteCounts of full resolution image
  • Tile content of full resolution image.
@rouault
Copy link
Collaborator

rouault commented Oct 16, 2019

That's a good point. Actually that's the layout that is more or less adopted by the COG driver of GDAL master 3.1dev as documented in https://gdal.org/drivers/raster/cog.html#raster-cog. It puts the TileOffsets and TileByteCounts arrays just after the IFD and before the first image data

@tbonfort
Copy link
Author

@rouault do the COG-driver produced images validate with validate_cloud_optimized_geotiff.py ? If not, should it be updated to account for that format (and the interleaved one proposed here) ?

@rouault
Copy link
Collaborator

rouault commented Oct 16, 2019

yes, validate_cloud_optimized_geotiff.py in GDAL repository has been updated to accept files produced by the COG driver (the validation script is used by the autotests of the COG driver), but it doesn't go into that detail to check where the TileOffsets/ByteCounts array are located in the file. As we don't have formal requirements, the validation script isn't necessarily checking everything.

@tbonfort
Copy link
Author

Should I open a PR for this ?

@rouault
Copy link
Collaborator

rouault commented Oct 18, 2019

Should I open a PR for this ?

If you want, but you'll likely have to parse at hand the IFDs to find the information as it is an internal information kept hidden by libtiff. Non optimal placement of TileOffsets/ByteCounts should probably only be reported as a warning, until we have formal requirements of what is a COG and what is not.

@tbonfort
Copy link
Author

Sorry, for the confusion, I was talking about a PR against the spec itself.

@brawer
Copy link

brawer commented Dec 17, 2021

So, other than what the COG spec currently claims, TileOffsets should actually be placed after the last IFD, but before the tile content of the first overview level? Any objections to changing the COG spec to match GDAL’s documentation about LAYOUT=IFDS_BEFORE_DATA?

How should TileOffsets be sorted in a COG file? Probably from coarsest overview to full-resolution image? Or should TileOffsets be placed directly before the imagery data for each level, so a streaming client can quickly display a world overview?

How about TileByteCounts? When using Tile data leaders and trailers, TileByteCounts doesn’t appear to be accessed at all. At least not by GDAL, assuming their documentation is correct. Given that, would it make sense to allow placing TileByteCounts at the very end of the file?

@tbonfort
Copy link
Author

So, other than what the COG spec currently claims, TileOffsets should actually be placed after the last IFD, but before the tile content of the first overview level?

correct

How should TileOffsets be sorted in a COG file? Probably from coarsest overview to full-resolution image? Or should TileOffsets be placed directly before the imagery data for each level, so a streaming client can quickly display a world overview?

I honestly don't think this makes a difference in the real world, unless you are making a sufficiently large single range request that encompasses all the ifds, the coarse level tileoffsets, and the the whole tile data of the coarse level. Such a large "blind" request has the potential for being sub-optimal in (many?) other cases.

How about TileByteCounts? When using Tile data leaders and trailers, TileByteCounts doesn’t appear to be accessed at all. At least not by GDAL, assuming their documentation is correct. Given that, would it make sense to allow placing TileByteCounts at the very end of the file?

For a gdal-optimized layouts, I agree that pushing the TileByteCounts at the end of the file makes a lot of sense.

@rouault
Copy link
Collaborator

rouault commented Dec 17, 2021

For a gdal-optimized layouts, I agree that pushing the TileByteCounts at the end of the file makes a lot of sense.

That's something controlled by libtiff. I had to add new behaviour for it to defer the writing of the TileOffsets/TileByteCounts arrays (https://gitlab.com/libtiff/libtiff/-/merge_requests/82). I presume a further enhancement could separate the writing of TileOffsets and TileByteCounts but that's an extra complication over non trivial code. Even if nicer, the practical benefit of having TileByteCounts at the end seems to be for very particular use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants