Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support sfheaders #251

Open
xiaodaigh opened this issue Jan 6, 2020 · 1 comment
Open

support sfheaders #251

xiaodaigh opened this issue Jan 6, 2020 · 1 comment

Comments

@xiaodaigh
Copy link
Collaborator

xiaodaigh commented Jan 6, 2020

https://github.com/dcooley/sfheaders

dcooley/sfheaders#40

@mdsumner
Copy link

mdsumner commented Mar 14, 2020

I had a look at an example because you made me think of scanning through a vector source without having the entire sf object in memory, this uses the virtual FID field from GDAL to read a feature at a time:

  library(disk.frame)

df_path <- file.path(tempdir(), "disk_frame_sf")
diskf <- disk.frame(df_path)

sfsrc <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
(layer <- sf::st_layers(sfsrc)$name[1L])
#> [1] "nc.gpkg"
## find out how many features and what the first FID is (it varies)
cnt <- sf::read_sf(sfsrc, query = sprintf("SELECT MIN(FID) AS minfid, COUNT(*) AS n_features FROM [%s]", layer))
#> Warning: no simple feature geometries present: returning a data.frame or
#> tbl_df
offset <- if (cnt$minfid == 0) 1 else 0
## scan a feature at a time
for (i in seq_len(cnt$n_features) ) {
  sf0 <- sf::read_sf(sfsrc, query = sprintf("SELECT * FROM [%s] WHERE FID == %i", 
                                              layer, i - offset))
  add_chunk(diskf, sfheaders::sf_to_df(sf0))
}

diskf
#> path: "/tmp/RtmpIC4Ica/disk_frame_sf"
#> nchunks: 100
#> nrow (at source): 2529
#> ncol (at source): 6
#> nrow (post operations): ???
#> ncol (post operations): ???

Created on 2020-03-14 by the reprex package (v0.3.0)

I wonder what kind of workflows you are envisioning?

There's a lot of other options to this, the query for layer and FID is awkward via SQL, but with vapour (for example) we can scan/skip over geometries or attributes arbitrarily (still need sf to convert from binary).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants