Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEOMESA-3259 FSDS - Add support for GeoParquet #3064

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Commits on Jun 4, 2024

  1. GEOMESA-3259 FSDS - Add support for GeoParquet

    commit 0ea8bff
    Author: adeet1 <[email protected]>
    Date:   Fri Mar 29 20:29:40 2024 +0000
    
        Optimize imports
    
    commit 9ebd85a
    Author: adeet1 <[email protected]>
    Date:   Fri Mar 29 20:12:03 2024 +0000
    
        Initialize bounds as an empty array instead of null
    
        * This fixes a failing unit test "suppress or allow empty output files" in ExportCommandTest.scala
    
    commit 4cff76a
    Author: adeet1 <[email protected]>
    Date:   Fri Mar 29 15:18:09 2024 +0000
    
        Split Parquet and Orc file compaction tests in order to differentiate the comparisons
    
    commit 16d88fd
    Author: adeet1 <[email protected]>
    Date:   Wed Mar 27 20:48:07 2024 +0000
    
        Assert in each partition that GeoParquet metadata bounding boxes across files are correctly merged upon compaction
    
        * Write features with different geometries and coordinates, so we can test the merging of unique bounding boxes.
    
    commit 4197e4d
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 21:27:17 2024 +0000
    
        Change thunk to lazy vals
    
    commit 4eaf9fc
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 20:22:10 2024 +0000
    
        Implement methods instead of lazy vals
    
    commit c82c0d2
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 20:13:56 2024 +0000
    
        Move test scope
    
    commit 09588e8
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 20:01:00 2024 +0000
    
        Don't create a GeoParquet metadata string if the SFT has no geometries
    
    commit 137dcb5
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 19:36:31 2024 +0000
    
        Re-implement GeoParquet metadata logic to work for SFTs with multiple geometries
    
    commit 360c2c7
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 16:58:26 2024 +0000
    
        Change back to GroupReadSupport
    
        * This simply checks if the Parquet file is valid - it won't deserialize/manifest everything and thus saves us some processing
    
    commit 3bce59e
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 14:39:34 2024 +0000
    
        Use the released GeoParquet metadata schema, not the dev one
    
    commit 878abb5
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 28 14:30:35 2024 +0000
    
        Optimize imports
    
    commit d49fc3a
    Author: adeet1 <[email protected]>
    Date:   Wed Mar 27 14:47:54 2024 +0000
    
        Assert that the bounding box in the GeoParquet metadata is correct
    
    commit 2ae9574
    Author: adeet1 <[email protected]>
    Date:   Tue Mar 26 23:14:46 2024 +0000
    
        Instantiate the observer directly in SimpleFeatureWriteSupport instead of passing it down from SimpleFeatureParquetWriter
    
    commit 9770a3a
    Author: adeet1 <[email protected]>
    Date:   Fri Mar 22 14:09:05 2024 +0000
    
        Tweak targetSize
    
    commit 604e614
    Author: adeet1 <[email protected]>
    Date:   Wed Mar 20 19:55:59 2024 +0000
    
        Assert that the file metadata adheres to the GeoParquet metadata json schema
    
    commit 2257d6c
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 22:03:29 2024 +0000
    
        Deprecate the ParquetFunctionFactory class, but provide backwards compatibility
    
    commit 03e699f
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 20:04:43 2024 +0000
    
        Create a new metadata map instance when adding bounding box
    
    commit 8630eed
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 18:07:30 2024 +0000
    
        Change BoundsObserver argument back to FileSystemObserver
    
    commit 921274b
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 17:53:38 2024 +0000
    
        If the sft has no geometry field, then omit the GeoParquet metadata entirely
    
    commit c1dda99
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 17:51:26 2024 +0000
    
        Omit orientation, edges and epoch
    
    commit dabdc43
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 17:39:47 2024 +0000
    
        Make variables private to avoid exposing mutable state outside the scope of the class
    
    commit 5eecf48
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 17:32:01 2024 +0000
    
        Delete redundant checks in geometry read and write support
    
    commit 0ed5c65
    Author: adeet1 <[email protected]>
    Date:   Thu Mar 21 14:55:29 2024 +0000
    
        Delete duplicate dependency
    
    commit 3dc798d
    Author: adeet1 <[email protected]>
    Date:   Wed Mar 20 19:09:44 2024 +0000
    
        Support backwards compatibility for FilterConverter
    
    commit 7dea125
    Author: adeet1 <[email protected]>
    Date:   Wed Mar 20 15:32:31 2024 +0000
    
        Delete .parquet.crc file after running tests
    
    commit 652bf3a
    Author: Adeet Patel <[email protected]>
    Date:   Mon Feb 12 12:16:35 2024 -0500
    
        GEOMESA-3259 FSDS - Add support for GeoParquet
    
        * Create a BoundsObserver trait, and tweak various classes and methods to use that trait
        * Add an observer to the SimpleFeatureParquetWriter and write records to it, in order to create a bounding box of all the geometries. Add this bounding box to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure).
        * Read/write all geometry attributes in binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility
        * Add support for parsing WKB bytes in the Parquet geometry transformer functions
        * Exclude bounding box from the GeoTools filter and use a spatial index instead
    
        Co-authored-by: Emilio Lahr-Vivaz <[email protected]>
    adeet1 committed Jun 4, 2024
    Configuration menu
    Copy the full SHA
    4af7d90 View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2024

  1. Configuration menu
    Copy the full SHA
    a777517 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2024

  1. Configuration menu
    Copy the full SHA
    48f7bd5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2040c4e View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. Configuration menu
    Copy the full SHA
    830d1c6 View commit details
    Browse the repository at this point in the history