You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spatial index is always external catalog, specific to the application (ex: PostgreSQL, ElasticSearch)
Assets are going to be large files like COGs and we're going to figure out how to sample them
Reading/reprojecting/resampling a window should be inlined in IO stage
ie: avoid using spark shuffles to do those things when they can be done at read time
Tile reads will be deferred until last possible moment
This is how these principles may play out in RasterFrames API when we introduce an idea of a Workspace for output rasters.
objectUseCaseRF {
// we are in Zeppelin ... we got geotiffsvalcatalog=StacCatalog("http://landsat.pds/catalog.json")
valitems:Seq[StacItem] =
stacCatalog.query(
provider ="NAIP",
bbox =???,
timeRange =???,
Tag("eo:cloud_cover") <0.5)
// Lets separate the act of querying for assets from act of reading them// If scenes remain in their native sizes, ex 800MB per LC8 scene its nearly// certain that the scene selection and filtering can happen fully on drivervalassets:Seq[StacAsset] = items.flatMap( ??? ) // list out / filter the scenes/** Pixel layout and projection for our results */caseclassWorkspace(crs: CRS, layout: LayoutDefinition)
// ... Because we're pre-planning the reads to inline reprojection and resampling we will need to know the target pixel grid// Case 1: we know target, we're building pyramidvalworkspace=Workspace
.fromPyramid(crs =WebMercator, level =13)
.build
// Case 2: we're inferring layout from assets, they better matchvalworkspace=Workspace
.from(assets)
.build
// ... check there is one CRS, use it or throw// ... check they are grid aligned, use it or throw// Case 3: We know something, but not too muchvalworkspace=Workspace
.from(assets) // we checked that CRS is good, why did we do that?
.align(assets.head, NearestNeighbor) // I gues the first one is good enough
.build
// evidence: T: Asset => RasterSource(extent, crs, cols/rows) { def read(bbox: RasterExtent): Tile }valrf= assets.toRF(workspace)
// - we are going to look at each asset// - blow it out into tiles from workspace// => 1 asset -> n rows// => we're going to get multiple tiles per key// => DataFrame join/shuffle will be cheap to bring those keys together/** Something that represents delayed tile read */caseclassRasterRef(re: RasterExtent, crs: CRS, source: RasterSource)
// .. these will not be visible to the user but will be generated under the covers// Actually, I KNOW that these assets are somehow part of the layer.// So when I sample, I better sample across scenes .. so what tells me to do that...valmergeFunction:Seq[Double] =>Double=???valrf= assets.grouped(mergeFunction).toRF(workspace)
// - we're going to look at each tile we will produce// - we're going to make RasterRef that joins pixels from each overlapping asset// => We need RasterRef because we need to group multiple sources per target tile// So this is probably not true, we will want to rely on the join/shuffle in RDD space to bring the RasterRefs together// So this part of API is suspect for RasterFrame/RDD case// this is only going to do 5% of IO
rf.sample(0.05).select(aggregateHistogram("col1"))
}
The text was updated successfully, but these errors were encountered:
@metasim Nope, thats just flavor. Was just thinking on the spot that a hypothetical STAC query method would need some type to represent filters against optional or user defined tags whereas required fields can be vanilla parameters.
Edit:
I know you broached the idea that LayerQuery could be better if it assembled the filters into an expression tree instead of eagerly evaluating them. In general that is linked to that.
This is record of a chat with @metasim.
The underlying principles are this:
This is how these principles may play out in RasterFrames API when we introduce an idea of a
Workspace
for output rasters.The text was updated successfully, but these errors were encountered: