Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration of the GeoParquet specification to the official OGC templates #224

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
/scripts/data/
/scripts/__pycache__/
/format-specs/relaton/
/format-specs/iev/
.DS_Store
format-specs/document.err.html
format-specs/document.presentation.xml
format-specs/document.html
4 changes: 4 additions & 0 deletions format-specs/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
source "https://rubygems.org"

gem "metanorma-cli"
gem "relaton-cli"
45 changes: 45 additions & 0 deletions format-specs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# GeoParquet Standard

The GeoParquet standard is specified in this directory. For the clearest overview of the requirements see [`geoparquet.md`](geoparquet.md). It is the 'latest' version of the specification, and you can see its version in the [Version and Schema](geoparquet.md#version-and-schema) section of the document. If it has `-dev` in the suffix then it is an unreleased version of the standard. For the stable versions view the `geoparquet.md` file in the git tree tagged in [the releases](https://github.com/opengeospatial/geoparquet/releases), for example [v1.0.0/format-specs/geoparquet.md](https://github.com/opengeospatial/geoparquet/blob/v1.0.0/format-specs/geoparquet.md) for version 1.0.0.

The official OGC GeoParquet standard is also contained in this directory, and it will often lag behind the markdown document. The target version of the GeoParquet standard can be found in the Preface of the [front material document](sections/clause_0_front_material.adoc). The OGC standard is built from the various other documents in this directory. They are `.adoc` files, in the [asciidoc](https://asciidoc.org/) format. They all automatically get built into a single pdf and published at [docs.ogc.org/DRAFTS/24-013.html](https://docs.ogc.org/DRAFTS/24-013.html) by a cron job running on OGC's infrastructure. The 'official' OGC version will be proposed from that draft, and when accepted by the OGC Technical Committee (TC) will become the official 1.0.0 version of the specification.

[Released versions](https://github.com/opengeospatial/geoparquet/releases) of GeoParquet (from the markdown file in this repository) will not be changed when OGC officially releases GeoParquet 1.0.0, so if changes are needed for OGC approval, then the will be released with a new version number. There will continue to be releases from this repository, which will technically remain 'draft' standards until the OGC TC has officially accepted the next version.

## In this directory

The key files and folders in this directory are as follows:

* [`geoparquet.md`](geoparquet.md) - The latest specification overview, which may run ahead of the standard. It consists of narrative explanations and clear tables for people to get a clear idea of all that needs to be done to implement GeoParquet.
* [`schema.json`](schema.json) - The definitive schema that validates GeoParquet metadata to ensure complaince with the standard.
* [`compatible-parquet.md`](compatible-parquet.md) - A set of guidelines for those would like to produce geospatial Parquet data but are using tools that are not yet fully implementing GeoParquet metadata. Not an official part of the standard.
* [`document.adoc`](document.adoc) - The main standard document which sets the order of the other sections. This is less 'human-readable', as it is designed to be an official 'standard', with specific language to detail testable requirements.
* [`sections/`](sections/) - Each section of the standard document is a separate document in this folder. The order in the official standard is determined by the `document.adoc`. Most of these documents are boilerplate.
* [`sections/clause_6_normative_text.adoc`](sections/clause_6_normative_text.adoc) - The main text of the standard. Similar to the
`geoparquet.md`, but links to the definitive `requirements`.
* [`requirements/`](requirements/) - directory for requirements and requirement classes to be referenced in the normative text.
* [`abstract_tests/`](abstract_tests/) - the Abstract Test Suite comprising one test for every requirement.

There are a number of other folders, that are currently all empty, but are potentially used for the standard. These are retained for potential future use, but all are currently empty (except for template readmes)

* [`figures`](figures/) - Any figures needed for the standard go in this folder.
* [`images`](images/) - Image files for graphics in the standard go in this folder. Image files for figures go in the `figures` directory. Only place in here images not used in figures (e.g., as parts of tables, as logos, etc.)
* [`code`](code/) - Sample code to accompany the standard, if desired

More information about the document template is [here](https://github.com/opengeospatial/templates/tree/master/standard#readme).

## Authoring the Specification

The GeoParquet markdown file will naturally be a bit 'ahead' of the OGC standard defined in asciidocs. For now the way to author the spec is to just focus on pull requests to the markdown file. The 'community' release will be cut from the markdown file, and then the 'official' OGC release will follow. A volunteer will update all the asciidoc text and requirements to reflect the release, and submit to OGC for official voting.

This may shift in the future, requiring PR's to the markdown to also update the asciidocs, but for now there will just be 'batch' processing of the changes.

An authoring guide for the metanorma / asciidoc editing of the standard is available at [metanorma.org](https://www.metanorma.org/author/ogc/authoring-guide/).

## Building the OGC standard

A local version of the OGC standard can be created by running `docker run -v "$(pwd)":/metanorma -v ${HOME}/.fontist/fonts/:/config/fonts metanorma/metanorma metanorma compile --agree-to-terms -t ogc -x html document.adoc`.

## Auto built document

A daily built document is available at [OGC Document DRAFTS](https://docs.ogc.org/DRAFTS/).
48 changes: 48 additions & 0 deletions format-specs/abstract_tests/ATS_class_core.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
[[ats_core]]
[conformance_class]
====
[%metadata]
identifier:: /conf/core
target:: /req/core
classification:: Target Type:Apache Parquet file
conformance-test:: /conf/core/geometry-columns
conformance-test:: /conf/core/nesting
conformance-test:: /conf/core/repetition
conformance-test:: /conf/core/metadata
conformance-test:: /conf/core/crs
conformance-test:: /conf/core/epoch
conformance-test:: /conf/core/orientation
conformance-test:: /conf/core/bbox
====

==== Geometry colums

include::./TEST001.adoc[]

==== Nesting

include::./TEST002.adoc[]

==== Repetition

include::./TEST003.adoc[]

==== Metadata

include::./TEST004.adoc[]

==== CRS

include::./TEST005.adoc[]

==== Epoch

include::./TEST006.adoc[]

==== Orientation

include::./TEST007.adoc[]

==== Bounding Box

include::./TEST008.adoc[]
5 changes: 5 additions & 0 deletions format-specs/abstract_tests/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This folder contains the Abstract Test Suite.

The test is expressed according to this pattern:

NOTE: for each test, there should be a corresponding requirement in the "requirements" folder.
15 changes: 15 additions & 0 deletions format-specs/abstract_tests/TEST001.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/geometry-columns
target:: /req/core/geometry-columns
test-purpose:: Validate that geometry columns are stored using the BYTE_ARRAY parquet type.
test-method::
+
--
1. Verify that geometry columns are stored using the BYTE_ARRAY parquet type.

2. Verify that geometries are encoded as WKB.
--
====
16 changes: 16 additions & 0 deletions format-specs/abstract_tests/TEST002.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/nesting
target:: /req/core/nesting
test-purpose:: Validate that geometries are not contained in complex or nested types such as structs, lists, arrays, or map types.
test-method::
+
--
1. Verify that geometry columns are at the root of the schema.

2. Verify that no geometry is a group field or nested in a group.

--
====
16 changes: 16 additions & 0 deletions format-specs/abstract_tests/TEST003.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/repetition
target:: /req/core/repetition
test-purpose:: Validate the cardinality of geometry columns.
test-method::
+
--
1. Verify that the cardinality for all geometry columns is “required” (exactly one) or “optional” (zero or one).

2. Verify that no geometry column is repeated.

--
====
19 changes: 19 additions & 0 deletions format-specs/abstract_tests/TEST004.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/metadata
target:: /req/core/metadata
test-purpose:: Validate the metadata keys contained in the GeoParquet file.
test-method::
+
--

1. Verify that the GeoParquet file includes a geo key in the Parquet metadata (see FileMetaData::key_value_metadata).

2. Verify that the value of this key is a JSON-encoded UTF-8 string representing the file and column metadata that validates against the GeoParquet metadata schema.

3. Verify that each geometry column in the dataset is included in the columns field (specified in <<tbl_file_and_column_metadata_fields>>) with the content specified in <<tbl_column_metadata>>, keyed by the column name

--
====
17 changes: 17 additions & 0 deletions format-specs/abstract_tests/TEST005.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/crs
target:: /req/core/crs
test-purpose:: Validate that the CRS correctly specified.
test-method::
+
--

1. If CRS is provided, verify that the CRS is provided in https://proj.org/specifications/projjson.html[PROJJSON] format.

2. If CRS is not provided, verify that all coordinates in the geometries use longitude, latitude based on the WGS84 datum, and the default value is https://www.opengis.net/def/crs/OGC/1.3/CRS84[OGC:CRS84] for CRS-aware implementations.

--
====
15 changes: 15 additions & 0 deletions format-specs/abstract_tests/TEST006.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/epoch
target:: /req/core/epoch
test-purpose:: If the crs field defines a dynamic CRS, validate that the coordinates are qualified with the epoch at which they are valid.
test-method::
+
--

1. If the crs field defines a dynamic CRS, verify that the coordinates are qualified with the epoch at which they are valid.

--
====
17 changes: 17 additions & 0 deletions format-specs/abstract_tests/TEST007.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/orientation
target:: /req/core/orientation
test-purpose:: Validate the winding order of polygons.
test-method::
+
--

1. Verify that all vertices of exterior polygon rings are ordered in the counterclockwise direction

2. Verify that all interior rings are ordered in the clockwise direction.

--
====
14 changes: 14 additions & 0 deletions format-specs/abstract_tests/TEST008.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

[abstract_test]
====
[%metadata]
identifier:: /conf/core/bbox
target:: /req/core/bbox
test-purpose:: Validate that the bounding boxes are constructed correctly.
test-method::
+
--
1. Verify that the bbox, if specified, is encoded with an array representing the range of values for each dimension in the geometry coordinates.

--
====
1 change: 1 addition & 0 deletions format-specs/code/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Sample code may be stored in this folder, organized as you see fit
60 changes: 60 additions & 0 deletions format-specs/document.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
= GeoParquet Specification
:doctype: standard
:encoding: utf-8
:lang: en
:status: draft
:committee: technical
:draft: 3.0
:external-id: http://www.opengis.net/doc/IS/geoparquet/1.0
:docnumber: 24-013
:received-date: 2029-03-30
:issued-date: 2029-03-30
:published-date: 2029-03-30
:fullname: Chris Holmes
:fullname_2: Tim Schaub
:fullname_3: Joris Van den Bossche
:fullname_4: Kyle Barron
:fullname_5: Javier de la Torre
:docsubtype: Interface
:keywords: ogcdoc, OGC document, geoparquet, parquet, columnar, cloud
:submitting-organizations: Planet; CARTO
:mn-document-class: ogc
:mn-output-extensions: xml,html,doc,pdf
:local-cache-only:
:data-uri-image:
:pdf-uri: ./document.pdf
:xml-uri: ./document.xml
:doc-uri: ./document.doc
:edition: 1.0.0

////
Make sure to complete each included document
////
include::sections/clause_0_front_material.adoc[]

include::sections/clause_1_scope.adoc[]

include::sections/clause_2_conformance.adoc[]

include::sections/clause_3_references.adoc[]

include::sections/clause_4_terms_and_definitions.adoc[]

include::sections/clause_5_conventions.adoc[]

include::sections/clause_6_normative_text.adoc[]


////
add or remove annexes after "A" as necessary
////

include::sections/annex-a.adoc[]

////
Revision History should be the last annex before the Bibliography
Bibliography should be the last annex
////
include::sections/annex-history.adoc[]

include::sections/annex-bibliography.adoc[]
5 changes: 5 additions & 0 deletions format-specs/figures/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Figures go here.

Each figure is a separate file with the naming convention:

"FIGn.xxx" where "n" is a number with leading zeroes appropriate for the total number of figures and "xxx" is the appropriate extension for the file type.
5 changes: 5 additions & 0 deletions format-specs/images/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Image files for graphics go here. Image files for figures go in the "figures" directory. Only place in here images not used in figures (e.g., as parts of tables, as logos, etc.)

Each graphic is a separate file with the naming convention:

"GRPn.xxx" where "n" is a sequential number with leading zeroes appropriate for the total number of graphics and "xxx" is the appropriate extension for the file type.
3 changes: 3 additions & 0 deletions format-specs/notes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Confirm the target type of the Abstract Test suite. Presumably it is the Parquet file.

Confirm the editors, submitters and contributors.
6 changes: 6 additions & 0 deletions format-specs/recommendations/recommendation001.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[recommendation]
====
[%metadata]
identifier:: /rec/core/encoding
part:: The geometry encoding SHOULD be the https://portal.ogc.org/files/?artifact_id=18241[OpenGIS® Implementation Specification for Geographic information — Simple feature access — Part 1: Common architecture] WKB representation (using codes for 3D geometry types in the [1001,1007] range).
====
6 changes: 6 additions & 0 deletions format-specs/recommendations/recommendation002.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[recommendation]
====
[%metadata]
identifier:: /rec/core/orientation-spherical-edges
part:: If edges is “spherical”, the orientation SHOULD always be set to counterclockwise
====
6 changes: 6 additions & 0 deletions format-specs/recommendations/recommendation003.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[recommendation]
====
[%metadata]
identifier:: /rec/core/feature-identifiers
part:: If you are using GeoParquet to serialize geospatial data with feature identifiers, you SHOULD create your own https://github.com/apache/parquet-format#metadata[file key/value metadata] to indicate the column that represents this identifier.
====
15 changes: 15 additions & 0 deletions format-specs/requirements/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
This folder contains requirements description.

Each file is a single requirement. The naming convention for these files is:

"REQn.adoc" where "n" corresponds to the requirement number. Numbers should have preceding zeros appropriate for the total number of requirements in the project (e.g., the first requirement could be REQ001 if less than 1000 requirements are anticipated).

The requirement files are integrated into the main document as links.

The requirement is expressed according to this pattern:

NOTE: for each requirement, there should be a corresponding Abstract Test in the "abstract_tests" folder.

NOTE: sample code may reference one or more requirements and should state which requirements are included in the code by adding the following line to the Extended Description:

"#REQS: reqnum1,reqnum2,...reqnumn"
7 changes: 7 additions & 0 deletions format-specs/requirements/requirement001.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[requirement]
====
[%metadata]
identifier:: /req/core/geometry-columns
part:: Geometry columns SHALL be stored using the BYTE_ARRAY parquet type.
part:: Geometries SHALL be encoded as https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary[Well Known Binary (WKB)].
====
7 changes: 7 additions & 0 deletions format-specs/requirements/requirement002.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[requirement]
====
[%metadata]
identifier:: /req/core/nesting
part:: Geometry columns SHALL be at the root of the schema.
part:: A geometry SHALL NOT be a group field or nested in a group.
====
7 changes: 7 additions & 0 deletions format-specs/requirements/requirement003.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[requirement]
====
[%metadata]
identifier:: /req/core/repetition
part:: The repetition for all geometry columns SHALL be “required” (exactly one) or “optional” (zero or one).
part:: A geometry column SHALL NOT be repeated.
====
Loading
Loading