-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-pass stats, pyramids, and histogram #116
Conversation
… sub-sampling for pyramids
…seems to go Very Slowly
…emove attempt at flushing pyramid block from Python, as this works, but slows everything down. Will have to rely on the KEA C++ fix
…EST', which it probably should have been all along
…ode with single-pass calculations. More to do, but so far seems to work
… use in imagewriter. Can now manage each of the three actions independently
…d imagename= parameter for setApproxStats. Add check on use of approx stats on thematic outputs
…ows considerable promise
Hi @neilflood. These single pass calculations are a great idea. I have noted that |
Hi @tonykgill Doing NEAREST is easy using just I also used to generally use AVERAGE for continuous variables, but have often thought that it does not matter much. In general I never use the pyramids for calculations, only for viewing convenience, and so I don't think that the use of NEAREST is really any disadvantage. If you think it might be important for you, please speak up. I am quite prepared to look into either of the above approaches (or any other) if there is a real need. I just thought it probably didn't matter, so took the simplest path. |
Thanks Neil. Happy to leave it. Having single pass on the stats and histograms is still a step forward. Just FYI, we use average resampling for products that are viewed by users in our web maps. We have found that average creates a nicer product, and better user experience, than nearest. It reduces much of the speckle in a product that is relatively spatiallly noisy at full resolution. I think if the product wasn't noisy, then nearest would probably be ok. |
OK, that's interesting to know. That does sound like there are situations where it matters, so I will keep that in mind, and look for a good solution. |
That's very generous of you Neil. Thank you. |
Add single-pass calculation for pyramid layers (i.e. overviews), basic statistics, and histograms, on output image files.
Purpose
In the past, RIOS always used GDAL's own functions to add pyramid layers, statistics and histograms to all output files. These were done after the user processing had completed and the output files were already written. Each of these three components needed a separate call to GDAL, operating on the completed file, and so each required a separate pass through the already-written output rasters.
This pull request adds the ability to do all of these in the same single pass as the processing and writing of each block. This gives a useful saving in time spent on these tasks. For smaller images (< 8,000 x 8,000 pixels), the time spent was never very large anyway, but for much larger images, there was considerable time spent, and so the time saved by doing these with a single pass is also considerable.
The default behaviour is to attempt to use single-pass for all three components, wherever possible, and to fall back on using GDAL when it is not.
Each of the three components is now handled much more independently of the others. This includes new controls methods -
setOmitBasicStats
andsetOmitHistogram
, which match with the previoussetOmitPyramids
. These provide more granular control than the previoussetCalcStats
method, although the latter still works as always.There are also three new methods to force the use of single-pass or GDAL -
setSinglePassPyramids
,setSinglePassBasicStats
, andsetSinglePassHistogram
.The user should not need to make any changes to existing code, and will gain useful speed-ups in writing output files.
Driver Support
As this code was developed, it became apparent that the KEA driver did not support the direct writing of pyramid layers. Further investigation fixed this problem (OSGeo/gdal#10616), and it should be available in GDAL 3.9.3. To guard against this, the single-pass code tests for driver support (in any driver), and will fall back to at-end GDAL pyramid layers if required.
Timings
The timings object on the ApplierReturn now includes some new timers. Originally, the
closing
timer included the time spent on flushing the final blocks of output, and also all of the time spent in writing pyramids, statistics, and histogram. The final flush is now included in thewriting
timer (which is thus more accurate), and theclosing
timer has been replaced with separate timers forpyramids
,basicstats
, andhistogram
, which accumulate the time spent both calculating and writing the corresponding item.Pyramid Layer Aggregation Type
The default aggregation type used for creating pyramid layers with GDAL used to be 'AVERAGE', unless the file had been explicitly set as 'thematic', in which case it was 'NEAREST'. However, it is quite common that outputs are never set to be either thematic or athematic, and so 'AVERAGE' would end up being used more commonly than is appropriate. So, the default is now 'NEAREST' in all cases, which is more than adequate for most purposes.
Furthermore, the use of single-pass pyramids only supports 'NEAREST', so if anything else is explicitly selected, then single-pass is not possible and it will fall back to GDAL.
Centering of low-resolution pixels
When GDAL's BuildOverviews() function uses 'NEAREST' to subsample the raster, it selects the value from the top-left corner of the low resolution pixel. This is probably a minor bug in GDAL. When single-pass pyramids are used in RIOS, they will use the value at the centre of the low resolution pixel, which removes any sub-pixel shift. This is not a major issue either way, but it seemed good to get it right.
Test Suite
A lot of tests have been added for all these statistics calculations, checking various different combinations. As a result, the TESTSTATS set of tests now takes noticeably longer, but my paranoia is satisfied. The whole suite is still under 30 seconds, so not too serious.
Histogram Minimum
The original code would always set the histogram minimum value to be zero for thematic layers, and all 8-bit layers, and use the actual minimum for all other cases. I suspect it was unnecessary, and possibly a bad idea. However, it does appear to have been relied upon in other situations, most notably when using the RAT
ReadAsArray()
andWriteArray()
functions to access the Histogram column. Because of this, I have preserved the old behaviour.