-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression of ADAPT serialized datasets #127
Comments
How about Here's my crack at outlining archive/compression support. Single File Archive / CompressionAny system that supports the creation of an archive of ADAPT standard data SHALL conform to the Archive Structure, and MUST support the Standard Archive Format (tarball bzip2) as an option. Systems that generate archives SHALL NOT require encryption or password protection. Archive Structure
Standard Archive FormatThe Standard Archive Format is a tarball bzip2 file with a Tarballs are an open standard for archiving multiple files into a single file, with broad support across operating systems and programming languages. BZip2 compression is also an open standard with similar support and generally better compression than GZip. Tar/bz2 support is widely available, and installed by default on many operating systems including macOS and many Linux distributions. On Windows, additional software may be required, such as 7-Zip or WSL. Creating an archive
-c: create a new archive Extracting an archive
-x: extract files from an archive |
Agreement in 29 November 2023 to adopt approach above as a recommendation vs. a requirement. |
Hi
As I am only an interesting reader of ADAPT, I hijack an earlier thread instead of creating a new one.
I note that GDAL has implemented GeoParquet spatial sorting functionality in OSGeo/gdal#9185
which should substantially enhance the read speed of large files.
Is this being considered in ADAPT?
Best regards
Andreas Oxenstierna
Dalen Hörbyvägen 53
243 94 Höör
0730-26 97 12
…On 13 Nov 2023, 16:43 +0100, Chris ***@***.***>, wrote:
How about .tar.bz2, as it is an open format with wide usage and support, and doesn't support encryption? GZip is also a good candidate instead of BZip, if it is viewed as more available/accessible.
Here's my crack at outlining archive/compression support.
Single File Archive / Compression
Any system that supports the creation of an archive of ADAPT standard data SHALL conform to the Archive Structure, and MUST support the Standard Archive Format (tarball bzip2) as an option.
Systems that generate archives SHALL NOT require encryption or password protection.
Archive Structure
• ./adapt.json
• The adapt.json file is REQUIRED and MUST be at the root of the archive.
• ./**/*
• Additional files are OPTIONAL, and SHALL only be included if referenced in the adapt.json file. (ie. geospatial rasters/parquets).
Standard Archive Format
The Standard Archive Format is a tarball bzip2 file with a .tar.bz2 extension.
Tarballs are an open standard for archiving multiple files into a single file, with broad support across operating systems and programming languages.
BZip2 compression is also an open standard with similar support and generally better compression than GZip.
Tar/bz2 support is widely available, and installed by default on many operating systems including macOS and many Linux distributions. On Windows, additional software may be required, such as 7-Zip or WSL.
Creating an archive
tar -cjf archive.tar.bz2 adapt.json ./geospatial/
-c: create a new archive
-j: use bzip2 compression
-f: specify the output file name
Extracting an archive
tar -xjf archive.tar.bz2
-x: extract files from an archive
-j: use bzip2 compression
-f: specify the output file name
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@Andreasox I suspect your mention of it is the first many of us have heard of it. To date, our decisions have just been that vector data should be stored as GeoParquet, and, for common use cases mapping field coverage, all geometries should be polygons. The definition of all other columns is handled in the json header data, which map to the GeoParquet via column index. Are you suggesting that we require the bbox column? |
Hi
If purely to use as transfer format, then a spatial index is not necessary.
But if to be displayed visuallly in for example QGIS, a bbox should substantially enhance the read speed of large files.
I would recommend bbox, but not making it mandatory as I assume it is only GDAL/OGR that will make use of it.
Note that GDAL/OGR is used ”everywhere” in the GIS sector (including by QGIS) so its GeoParquet support may be widely used over time.
If relevant, I can test different performance scenarios if given relevant files.
Hälsningar
Andreas Oxenstierna
Dalen Hörbyvägen 53
243 94 Höör
0730-26 97 12
…On 22 Mar 2024, 16:11 +0100, Kelly Nelson ***@***.***>, wrote:
@Andreasox I suspect your mention of it is the first many of us have heard of it. To date, our decisions have just been that vector data should be stored as GeoParquet, and, for common use cases mapping field coverage, all geometries should be polygons. The definition of all other columns is handled in the json header data, which map to the GeoParquet via column index.
Are you suggesting that we require the bbox column?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
We discussed in the 27 March 2024 meeting and are not going to require the bounding box data. |
Initial discussion in 1 Nov 2023 meeting about if/how ADAPT datasets should be compressed.
Agreement in the meeting that at no time should an ADAPT dataset contain compressed archives within compressed archives, or have an uncompressed adapt.json file with compressed sub files.
The question of how to compress the adapt.json and its consitituent geospatial files was not resolved, however.
Some participants were in favor of ADAPT making no requirement of how entire datasets should be compressed (or not compressed). Other participants suggested we find a compression standard that has wide support and require data be compressed by that and only that.
The text was updated successfully, but these errors were encountered: