-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking: Standardise cf standards and missingval handling #695
base: main
Are you sure you want to change the base?
Conversation
How about inverting it? Instead of "cf" call it "raw". If raw=false, offset and scale are applied, if raw=true, offset and scale are NOT applied? And raw=true would be the default? Maybe "navalue" instead of "maskingval"? |
Currently I've got Main concern with |
Yes, you are right that with "raw" one may expect that data were not touched at all. Indeed, "navalue" originates from my heavy R usage! And I agree NA is not very julia way. Point taken:) |
Thanks for the input though, good to have all the options to choose from |
Coming back to this I like how short and obvious But to mean no scaling and no masking - so it will override everything else. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #695 +/- ##
==========================================
- Coverage 82.59% 81.79% -0.80%
==========================================
Files 60 64 +4
Lines 4510 4884 +374
==========================================
+ Hits 3725 3995 +270
- Misses 785 889 +104 ☔ View full report in Codecov by Sentry. |
@@ -0,0 +1,20 @@ | |||
module GeometryOpsDimensionalDataExt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we close the PR that does this in GeometryOps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I must have added that by mistake lol
|
||
""" | ||
create([f!], [filename], template; kw...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When should I use this as opposed to Raster(data, dims; ...)
? What are the advantages beyond efficiency in writing to file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the direct constructor going to call create
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well Raster(filename)
reads an existing file... create(filename)
creates one.
So this is best for building something new from scratch, Raster
is best for opening something, rewrapping a DimArray, etc. Otherwise you need to define the Raster in-memory then write
it as separate steps.
(create
is already called in a lot of places under the hood where a new file might be created, like in nearly all methods with a filename
keyword. It's been here for years, just not publicly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I guess I'm not entirely clear on what to use when creating a purely in memory dataset. Does the pure Raster(data, dims; kwargs...)
constructor have the same options? Should it? That's the most intuitive option IMO to create an in memory raster...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it will be pretty much identical for in-memory rasters. But if you want chunks and e.g. compression options you have to do it separately in write
. An in-memory raster doesn't have chunks or options like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet that sounds good. Will add a suggestion for a line in the docstring to clarify.
Co-authored-by: Anshul Singhvi <[email protected]>
This fixes a strange issue I was having where Rasters.jl read the wrong values from a Kerchunk/Zarr dataset.
To ignore `scale` and `offset` metadata, use `scaled=false`. If `scale` and | ||
Note: `offset` are `1.0` and `0.0` they will be ignored and the original type will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ignore `scale` and `offset` metadata, use `scaled=false`. If `scale` and | |
Note: `offset` are `1.0` and `0.0` they will be ignored and the original type will | |
To ignore `scale` and `offset` metadata, use `scaled=false`. Note: If `scale` | |
and `offset` are `1.0` and `0.0` they will be ignored and the original type will |
|
||
const COERCE_KEYWORD = """ | ||
- `coerce`: where `scale` and/or `offset` are present during `setindex!` to disk, | ||
coerce values to the disk type. `convert` is the default, but `round`, `trunc` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the disk type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it should be eltype
src/methods/shared_docstrings.jl
Outdated
- `maskingval`: A value to convert `missingval` to, by default `missing`. If this is set it | ||
will be the return value of `missingval(raster)` - `maskingval` becomes the new `missingval`. | ||
Setting `maskingval` to `nothing` means no masking will occur, and the original `missingval` | ||
will be the final `missingval`. This can give better performance than using `missing`. | ||
Another efficient option is to use e.g. `zero(eltype(raster))` to replace missing values with zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why you picked this but I'm not totally happy about the name. Still, I can't see any other nice options. missingval
would have been nice but it's already in use. I'd say we should go for it. A sufficiently good docstring will clarify all problems :P
Maybe emphasize that this does what replace_missing
does?
- `maskingval`: A value to convert `missingval` to, by default `missing`. If this is set it | |
will be the return value of `missingval(raster)` - `maskingval` becomes the new `missingval`. | |
Setting `maskingval` to `nothing` means no masking will occur, and the original `missingval` | |
will be the final `missingval`. This can give better performance than using `missing`. | |
Another efficient option is to use e.g. `zero(eltype(raster))` to replace missing values with zero. | |
- `maskingval`: A value to convert `missingval` to, by default `missing`. If this is set it | |
will be the return value of `missingval(raster)` - `maskingval` becomes the new `missingval`. | |
This is the same thing that `replace_missing(raster, some_value)` does. | |
Setting `maskingval` to `nothing` means no masking will occur, and the original `missingval` | |
will be the final `missingval`. This can give better performance than using `missing`. | |
Another efficient option is to pass e.g. `zero(eltype(raster))` to replace missing values with zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also what CommonDataModel uses so there's precedent... But sharing the mask part with the mask
function is kinda weird for me
|
||
const WRITE_MISSINGVAL_KEYWORD = """ | ||
- `missingval`: set the missing value (i.e. FillValue / nodataval) of the written raster, | ||
as Julias `missing` cannot be stored. If not passed in, `missingval` will be detected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as Julias `missing` cannot be stored. If not passed in, `missingval` will be detected | |
as Julia's `missing` cannot be stored. If not passed in, `missingval` will be detected |
This PR standardises cf and missingval behaviour accross all sources.
New keywords are:
cf=true
: applyoffset
andscale
if the are not zero and one.maskingval=missing
this is the value the missingval will be converted to if there is one.I don't like either of these names! if you can think of anything better please suggest something.
@felixcremer @meggart this is the new diskarrays object that does 2-way cf and missingval/maskingval conversions
https://github.com/rafaqz/Rasters.jl/compare/cf?expand=1#diff-1478a88a8bdcffcf7f556cc1865dbe18d69b5c03f75b4481e1f5889e8a6b6b73