Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to geoparquet geoarrow conversion #933

Merged
merged 4 commits into from
Jun 27, 2024

Conversation

msbarry
Copy link
Contributor

@msbarry msbarry commented Jun 27, 2024

Geoparquet geoarrow conversion had previously been building a map with x/y/z/m entries for every coordinate read from the parquet file, then converting those to JTS coodinates when the geometry is requested. This change eliminates that wasteful conversion and has the parquet reader deserialize x/y/z/m values directly into a JTS CoordinateSequence.

This also changes the reader to fail fast when parquet schema doesn't match what's expected for the primary geometry type.

Copy link

sonarcloud bot commented Jun 27, 2024

Copy link

This Branch 6fde8d8 Base 024e387
0:01:10 DEB [archive] - Tile stats:
0:01:10 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:10 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   443   583   938   339   433   548   773  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   25k   15k   13k   17k   15k   12k   25k
              place    0     0   441   441   441   639   712    1k  1.5k  3.1k  5.6k  3.3k  1.7k   795   936  5.6k
            landuse    0     0     0     0   548   694  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   243   782  1.2k  5.9k    8k   24k   17k   19k   65k   48k   34k   65k
           waterway    0     0     0     0   111   118     0     0     0  3.1k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k    4k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.4k  4.6k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   664   327   273   220   220   664
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   83k   83k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   42k   85k  203k  185k  135k  114k  128k  244k  244k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   60k  149k  138k   98k   83k   92k  154k  154k
0:01:10 DEB [archive] -    Max tile: 244k (gzipped: 154k)
0:01:10 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:10 DEB [archive] -     # tiles: 4,115,036
0:01:10 DEB [archive] -  # features: 5,487,099
0:01:10 INF [archive] - Finished in 19s cpu:1m9s avg:3.6
0:01:10 INF [archive] -   read    1x(3% 0.5s wait:17s done:1s)
0:01:10 INF [archive] -   encode  4x(54% 10s wait:2s done:1s)
0:01:10 INF [archive] -   write   1x(21% 4s wait:13s)
0:01:10 INF [archive] - Finished in 1m11s cpu:3m37s gc:1s avg:3.1
0:01:10 INF [archive] - FINISHED!
0:01:10 INF [archive] - 
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - data errors:
0:01:10 INF [archive] - 	render_snap_fix_input	16,667
0:01:10 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:10 INF [archive] - 	osm_boundary_missing_way	73
0:01:10 INF [archive] - 	merge_snap_fix_input	12
0:01:10 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:10 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:10 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:10 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:10 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - 	overall          1m11s cpu:3m37s gc:1s avg:3.1
0:01:10 INF [archive] - 	lake_centerlines 3s cpu:6s avg:2.1
0:01:10 INF [archive] - 	  read     1x(18% 0.5s done:2s)
0:01:10 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:10 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:10 INF [archive] - 	water_polygons   15s cpu:42s avg:2.8
0:01:10 INF [archive] - 	  read     1x(41% 6s done:6s)
0:01:10 INF [archive] - 	  process  4x(30% 5s wait:2s done:5s)
0:01:10 INF [archive] - 	  write    1x(4% 0.6s wait:10s done:5s)
0:01:10 INF [archive] - 	natural_earth    12s cpu:18s avg:1.5
0:01:10 INF [archive] - 	  read     1x(52% 6s done:5s)
0:01:10 INF [archive] - 	  process  4x(7% 0.8s wait:6s done:5s)
0:01:10 INF [archive] - 	  write    1x(0% 0s wait:6s done:5s)
0:01:10 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.1
0:01:10 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:10 INF [archive] - 	  parse    4x(33% 0.6s)
0:01:10 INF [archive] - 	  process  1x(72% 1s)
0:01:10 INF [archive] - 	osm_pass2        18s cpu:1m12s avg:3.9
0:01:10 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:10 INF [archive] - 	  process  4x(76% 14s)
0:01:10 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:10 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:10 INF [archive] - 	boundaries       0s cpu:0s avg:2.9
0:01:10 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:10 INF [archive] - 	sort             1s cpu:4s avg:2.6
0:01:10 INF [archive] - 	  worker  1x(49% 0.7s)
0:01:10 INF [archive] - 	archive          19s cpu:1m9s avg:3.6
0:01:10 INF [archive] - 	  read    1x(3% 0.5s wait:17s done:1s)
0:01:10 INF [archive] - 	  encode  4x(54% 10s wait:2s done:1s)
0:01:10 INF [archive] - 	  write   1x(21% 4s wait:13s)
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - 	archive	108MB
0:01:10 INF [archive] - 	features	281MB
-rw-r--r-- 1 runner docker 84M Jun 27 11:14 run.jar
0:01:04 DEB [archive] - Tile stats:
0:01:04 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:04 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   443   583   938   339   433   548   773  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   25k   15k   13k   17k   15k   12k   25k
              place    0     0   441   441   441   639   712    1k  1.5k  3.1k  5.6k  3.3k  1.7k   795   936  5.6k
            landuse    0     0     0     0   548   694  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   243   782  1.2k  5.9k    8k   24k   17k   19k   65k   48k   34k   65k
           waterway    0     0     0     0   111   118     0     0     0  3.1k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k    4k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.4k  4.6k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   664   327   273   220   220   664
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   83k   83k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   42k   85k  203k  185k  135k  114k  128k  244k  244k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   60k  149k  138k   98k   83k   92k  154k  154k
0:01:04 DEB [archive] -    Max tile: 244k (gzipped: 154k)
0:01:04 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:04 DEB [archive] -     # tiles: 4,115,036
0:01:04 DEB [archive] -  # features: 5,487,099
0:01:04 INF [archive] - Finished in 19s cpu:1m9s avg:3.6
0:01:04 INF [archive] -   read    1x(3% 0.5s wait:17s)
0:01:04 INF [archive] -   encode  4x(55% 10s wait:2s)
0:01:04 INF [archive] -   write   1x(21% 4s wait:13s)
0:01:04 INF [archive] - Finished in 1m4s cpu:3m31s gc:1s avg:3.3
0:01:04 INF [archive] - FINISHED!
0:01:04 INF [archive] - 
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - data errors:
0:01:04 INF [archive] - 	render_snap_fix_input	16,667
0:01:04 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:04 INF [archive] - 	osm_boundary_missing_way	73
0:01:04 INF [archive] - 	merge_snap_fix_input	12
0:01:04 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:04 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:04 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:04 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:04 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	overall          1m4s cpu:3m31s gc:1s avg:3.3
0:01:04 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.4
0:01:04 INF [archive] - 	  read     1x(22% 0.5s done:2s)
0:01:04 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:04 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:04 INF [archive] - 	water_polygons   15s cpu:42s avg:2.8
0:01:04 INF [archive] - 	  read     1x(40% 6s done:6s)
0:01:04 INF [archive] - 	  process  4x(29% 4s wait:3s done:5s)
0:01:04 INF [archive] - 	  write    1x(4% 0.6s wait:10s done:5s)
0:01:04 INF [archive] - 	natural_earth    6s cpu:12s avg:2
0:01:04 INF [archive] - 	  read     1x(96% 6s)
0:01:04 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:04 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:04 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.3
0:01:04 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:04 INF [archive] - 	  parse    4x(34% 0.6s)
0:01:04 INF [archive] - 	  process  1x(68% 1s)
0:01:04 INF [archive] - 	osm_pass2        18s cpu:1m11s avg:3.9
0:01:04 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:04 INF [archive] - 	  process  4x(76% 14s)
0:01:04 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:04 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:04 INF [archive] - 	boundaries       0s cpu:0s avg:2.6
0:01:04 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:04 INF [archive] - 	sort             1s cpu:3s avg:2.6
0:01:04 INF [archive] - 	  worker  1x(49% 0.7s)
0:01:04 INF [archive] - 	archive          19s cpu:1m9s avg:3.6
0:01:04 INF [archive] - 	  read    1x(3% 0.5s wait:17s)
0:01:04 INF [archive] - 	  encode  4x(55% 10s wait:2s)
0:01:04 INF [archive] - 	  write   1x(21% 4s wait:13s)
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	archive	108MB
0:01:04 INF [archive] - 	features	281MB
-rw-r--r-- 1 runner docker 84M Jun 27 11:16 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/9695530807

@msbarry msbarry merged commit cf534c1 into main Jun 27, 2024
12 checks passed
@msbarry msbarry mentioned this pull request Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant