Skip to content

Commit

Permalink
landmap v0.8
Browse files Browse the repository at this point in the history
  • Loading branch information
thengl committed Apr 11, 2021
1 parent de1248f commit 6531ff2
Show file tree
Hide file tree
Showing 6 changed files with 100 additions and 25 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Imports:
parallelMap,
sp,
rgdal,
meteo,
gdalUtils,
raster
Suggests: geoR, ParamHelpers, mda, psych, spdep, fossil, xgboost, plyr, kernlab, nnet, rjson, spatstat, maptools, maxlike, RCurl, aqp, deepnet, RSAGA, soiltexture, snowfall, plotKML, boot
Expand Down
71 changes: 57 additions & 14 deletions R/train.spLearner.R
Original file line number Diff line number Diff line change
Expand Up @@ -160,12 +160,14 @@ setMethod("train.spLearner", signature(observations = "data.frame", formulaStrin
#' @param cov.model Covariance model for variogram fitting,
#' @param subsample For large datasets consider random subsetting training data,
#' @param parallel logical, Initiate parallel processing,
#' @param buffer.dist Specify whether to use buffer distances to points as covariates,
#' @param oblique.coords Specify whether to use oblique coordinates as covariates,
#' @param nearest Specify whether to use nearest values and distances i.e. the method of Sekulic et al. (2020),
#' @param buffer.dist Specify whether to use buffer distances to points as covariates,
#' @param theta.list List of angles (in radians) used to derive oblique coordinates,
#' @param spc specifies whether to apply principal components transformation.
#' @param id Id column name to control clusters of data,
#' @param weights Optional weights (per row) that learners will use to account for variable data quality,
#' @param n.obs Number of nearest observations to be found in \code{meteo::near.obs} (by default 10),
#' @param ... other arguments that can be passed on to \code{mlr::makeStackedLearner},
#'
#' @importClassesFrom sp SpatialPixelsDataFrame SpatialPointsDataFrame
Expand All @@ -176,19 +178,23 @@ setMethod("train.spLearner", signature(observations = "data.frame", formulaStrin
#' @author \href{https://opengeohub.org/people/tom-hengl}{Tom Hengl}
#'
#' @note By default uses oblique coordinates (rotated coordinates) as described in Moller et al. (2020; \doi{10.5194/soil-6-269-2020}) to account for geographical distribution of values.
#' By setting the \code{nearest = TRUE}, distances to nearest observations and values of nearest neighbors will be used (see: Sekulic et al, 2020; \doi{10.3390/rs12101687}). This method closely resembles geostatistical interpolators such as kriging.
#' Buffer geographical distances can be added by setting \code{buffer.dist=TRUE}.
#' Using either oblique coordinates and/or buffer distances is not recommended for point data set with distinct spatial clustering.
#' Effects of adding geographical distances into modeling are explained in detail in Hengl et al. (2018; \doi{10.7717/peerj.5518}).
#' Effects of adding geographical distances into modeling are explained in detail in Hengl et al. (2018; \doi{10.7717/peerj.5518}) and Sekulic et al. (2020; \doi{10.3390/rs12101687}).
#' Default learners used for regression are: \code{c("regr.ranger", "regr.ksvm", "regr.nnet", "regr.cvglmnet")}.
#' Default learners used for classification / binomial variables are: \code{c("classif.ranger", "classif.svm", "classif.multinom")}, with \code{predict.type="prob"}.
#' When using \code{method = "stack.cv"} each training and prediction round could produce somewhat different results due to randomization of CV.
#' Prediction errors are derived by default using quantreg (Quantile Regression) option in the ranger package (\href{https://jmlr.org/papers/v7/meinshausen06a.html}{Meinshausen, 2006}).
#' Prediction errors are derived by default using the \code{forestError} package method described in Lu & Hardin (2021).
#' Optionally, the quantreg (Quantile Regression) option from the ranger package (\href{https://jmlr.org/papers/v7/meinshausen06a.html}{Meinshausen, 2006}) can also be used.
#'
#' @references
#' \itemize{
#' \item Moller, A. B., Beucher, A. M., Pouladi, N., and Greve, M. H. (2020). Oblique geographic coordinates as covariates for digital soil mapping. SOIL, 6, 269–289. \doi{10.5194/soil-6-269-2020}
#' \item Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., and Graler, B. (2018) Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6:e5518. \doi{10.7717/peerj.5518}
#' \item Lu, B., & Hardin, J. (2021). A unified framework for random forest prediction error estimation. Journal of Machine Learning Research, 22(8), 1–41. \url{https://jmlr.org/papers/v22/18-558.html}
#' \item Meinshausen, N. (2006). \href{https://jmlr.org/papers/v7/meinshausen06a.html}{Quantile regression forests}. Journal of Machine Learning Research, 7(Jun), 983–999. \url{https://jmlr.org/papers/v7/meinshausen06a.html}
#' \item Sekulic, A., Kilibarda, M., Heuvelink, G. B., Nikolic, M. & Bajat, B. (2020). Random Forest Spatial Interpolation. Remote. Sens. 12, 1687, \doi{10.3390/rs12101687}
#' }
#'
#' @examples
Expand Down Expand Up @@ -241,6 +247,22 @@ setMethod("train.spLearner", signature(observations = "data.frame", formulaStrin
#' main="Prediction intervals (alpha = 0.318)")
#' dev.off()
#'
#' ## Method from https://doi.org/10.3390/rs12101687
#' library(meteo)
#' mN <- train.spLearner(meuse["zinc"], covariates=meuse.grid[,c("dist","ffreq")],
#' parallel=FALSE, lambda = 0, nearest=TRUE)
#' meuse.N <- predict(mN)
#' ## Plot of predictions and prediction error (RMSPE)
#' op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
#' plot(raster(meuse.N$pred["response"]), col=R_pal[["rainbow_75"]][4:20],
#' main="Predictions spLearner meteo::near.obs", axes=FALSE, box=FALSE)
#' points(meuse, pch="+")
#' plot(raster(meuse.N$pred["model.error"]), col=rev(bpy.colors()),
#' main="Prediction errors", axes=FALSE, box=FALSE)
#' points(meuse, pch="+")
#' par(op)
#' dev.off()
#'
#' ## Classification:
#' SL.library <- c("classif.ranger", "classif.xgboost", "classif.nnTrain")
#' mC <- train.spLearner(meuse["soil"], covariates=meuse.grid[,c("dist","ffreq")],
Expand All @@ -254,9 +276,10 @@ setMethod("train.spLearner", signature(observations = "data.frame", formulaStrin
#' ## SIC1997
#' data("sic1997")
#' X <- sic1997$swiss1km[c("CHELSA_rainfall","DEM")]
#' mR <- train.spLearner(sic1997$daily.rainfall, covariates=X, lambda=1, parallel=FALSE)
#' mR <- train.spLearner(sic1997$daily.rainfall, covariates=X, lambda=1,
#' nearest = TRUE, parallel=FALSE)
#' summary(mR@spModel$learner.model$super.model$learner.model)
#' rainfall1km <- predict(mR)
#' rainfall1km <- predict(mR, what="mspe")
#' op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
#' plot(raster(rainfall1km$pred["response"]), col=R_pal[["rainbow_75"]][4:20],
#' main="Predictions spLearner", axes=FALSE, box=FALSE)
Expand All @@ -265,6 +288,7 @@ setMethod("train.spLearner", signature(observations = "data.frame", formulaStrin
#' main="Prediction errors", axes=FALSE, box=FALSE)
#' points(sic1997$daily.rainfall, pch="+")
#' par(op)
#' dev.off()
#'
#' ## Ebergotzen data set
#' data(eberg_grid)
Expand Down Expand Up @@ -301,7 +325,7 @@ setMethod("train.spLearner", signature(observations = "data.frame", formulaStrin
#' }
#' @export
#' @docType methods
setMethod("train.spLearner", signature(observations = "SpatialPointsDataFrame", formulaString = "ANY", covariates = "SpatialPixelsDataFrame"), function(observations, formulaString, covariates, SL.library, family = stats::gaussian(), method = "stack.cv", predict.type, super.learner = "regr.lm", subsets = 5, lambda = 0.5, cov.model = "exponential", subsample = 10000, parallel = "multicore", buffer.dist = FALSE, oblique.coords = TRUE, theta.list=seq(0, 180, length.out = 14)*pi/180, spc = TRUE, id = NULL, weights = NULL, ...){
setMethod("train.spLearner", signature(observations = "SpatialPointsDataFrame", formulaString = "ANY", covariates = "SpatialPixelsDataFrame"), function(observations, formulaString, covariates, SL.library, family = stats::gaussian(), method = "stack.cv", predict.type, super.learner = "regr.lm", subsets = 5, lambda = 0.5, cov.model = "exponential", subsample = 10000, parallel = "multicore", oblique.coords = TRUE, nearest = FALSE, buffer.dist = FALSE, theta.list=seq(0, 180, length.out = 14)*pi/180, spc = TRUE, id = NULL, weights = NULL, n.obs = 10, ...){
if(missing(formulaString)){
tv <- names(observations)[1]
observations <- observations[!is.na(observations@data[,tv]),]
Expand All @@ -323,6 +347,13 @@ setMethod("train.spLearner", signature(observations = "SpatialPointsDataFrame",
covariates <- covariates[-ncol(covariates)]
}
}
if(nearest==TRUE){
message("Deriving nearest observations using meteo::near.obs...", immediate. = TRUE)
nearest_obs <- meteo::near.obs(locations = covariates, observations = observations[tv], zcol = tv, n.obs = n.obs, rm.dupl = TRUE)
nearest_obs.dev <- meteo::near.obs(locations = observations[tv], observations = observations[tv], zcol = tv, n.obs = n.obs, rm.dupl = TRUE)
covariates <- sp::cbind.Spatial(covariates, sp::SpatialPixelsDataFrame(covariates@coords, nearest_obs, proj4string = sp::CRS(sp::proj4string(covariates))))
oblique.coords <- FALSE; buffer.dist <- FALSE
}
if(buffer.dist==TRUE){
message("Deriving buffer distances to points...", immediate. = TRUE)
classes <- as.factor(1:nrow(observations))
Expand Down Expand Up @@ -356,13 +387,22 @@ setMethod("train.spLearner", signature(observations = "SpatialPointsDataFrame",
}
}
ov <- model.data(observations, formulaString, covariates)
if(nearest==TRUE){
## replace values
ov[,names(nearest_obs.dev)] <- nearest_obs.dev
}
r.sel <- stats::complete.cases(ov[,all.vars(formulaString)])
n.r.sel <- sum(r.sel)
if(!n.r.sel==nrow(observations)){
message(paste0("Subsetting observations to ", round(n.r.sel/nrow(observations), 2)*100, "% complete cases..."), immediate. = TRUE)
}
m <- train.spLearner.matrix(observations = ov, formulaString = formulaString, covariates = covariates, SL.library = SL.library, family = family, subsets = subsets, lambda = lambda, cov.model = cov.model, subsample = subsample, parallel = parallel, id=id, weights=weights, predict.type = predict.type, ...)
return(m)
if(nearest==TRUE){
## skip variogram modeling
m <- train.spLearner.matrix(observations = ov, formulaString = formulaString, covariates = covariates, SL.library = SL.library, family = family, subsets = subsets, lambda = lambda, cov.model = "nugget", subsample = subsample, parallel = parallel, id=id, weights=weights, predict.type = predict.type, ...)
} else {
m <- train.spLearner.matrix(observations = ov, formulaString = formulaString, covariates = covariates, SL.library = SL.library, family = family, subsets = subsets, lambda = lambda, cov.model = cov.model, subsample = subsample, parallel = parallel, id=id, weights=weights, predict.type = predict.type, ...)
}
return(m)
})

#' Overlay points and grids and prepare regression matrix
Expand Down Expand Up @@ -443,12 +483,13 @@ model.data <- function(observations, formulaString, covariates, dimensions=c("2D
#' @param w optional weights vector.
#' @param quantiles Lower and upper quantiles for quantreg forest (0.159 and 0.841 for 1 standard deviation).
#' @param n.cores Number of cores to use (for parallel computation in ranger).
#' @param what A vector of characters indicating what estimates are desired for the quantForestError.
#' @param ... optional parameters.
#'
#' @return Object of class \code{SpatialPixelsDataFrame} with predictions and model error.
#' @method predict spLearner
#' @export
"predict.spLearner" <- function(object, predictionLocations, model.error=TRUE, error.type=c("forestError", "quantreg", "weighted.sd", "interval")[1], t.prob=1/3, w, quantiles = c((1-.682)/2, 1-(1-.682)/2), n.cores = parallel::detectCores(), ...){
"predict.spLearner" <- function(object, predictionLocations, model.error=TRUE, error.type=c("forestError", "quantreg", "weighted.sd", "interval")[1], t.prob=1/3, w, quantiles = c((1-.682)/2, 1-(1-.682)/2), n.cores = parallel::detectCores(), what = c("mspe", "bias", "interval"), ...){
if(any(object@spModel$task.desc$type=="classif")){
error.type <- "weighted.sd"
}
Expand Down Expand Up @@ -498,11 +539,13 @@ model.data <- function(observations, formulaString, covariates, dimensions=c("2D
m.train = object@spModel$learner.model$super.model$learner.model$model
m.terms = all.vars(object@spModel$learner.model$super.model$learner.model$terms)
## http://jmlr.org/papers/v22/18-558.html
pred.q = forestError::quantForestError(object@quantregModel, X.train=m.train[,m.terms[-1]], X.test=as.data.frame(out.c), Y.train=m.train[,m.terms[1]], alpha = (1-(quantiles[2]-quantiles[1])), n.cores = n.cores)
pred$model.error <- sqrt(pred.q$estimates$mspe)
pred$model.bias <- pred.q$estimates$bias
pred@data[,"q.lwr"] <- pred.q$estimates[,grep("lower", names(pred.q$estimates))]
pred@data[,"q.upr"] <- pred.q$estimates[,grep("upper", names(pred.q$estimates))]
pred.q = forestError::quantForestError(object@quantregModel, X.train=m.train[,m.terms[-1]], X.test=as.data.frame(out.c), Y.train=m.train[,m.terms[1]], what = what, alpha = (1-(quantiles[2]-quantiles[1])), n.cores = n.cores)
if(any(what %in% "mspe")){ pred$model.error <- sqrt(pred.q$mspe) }
if(any(what %in% "bias")){ pred$model.bias <- pred.q$bias }
if(any(what %in% "interval")){
pred@data[,"q.lwr"] <- pred.q[,grep("lower", names(pred.q))]
pred@data[,"q.upr"] <- pred.q[,grep("upper", names(pred.q))]
}
} else {
message("Deriving model errors using ranger package 'quantreg' option...", immediate. = TRUE)
pred.q = predict(object@quantregModel, as.data.frame(out.c), type="quantiles", quantiles=quantiles)
Expand Down
2 changes: 1 addition & 1 deletion R/tune.spLearner.R
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ setMethod("tune.spLearner", signature(object = "spLearner"), function(object, nu
}
## feature selection
lrn1 <- mlr::makeFeatSelWrapper(lrn.rf, resampling = inner, control = ctrlF, show.info=TRUE)
var.mod1 <- mlr::train(lrn1, task = tsk0)
try( var.mod1 <- mlr::train(lrn1, task = tsk0), silent = TRUE)
if(!class(.Last.value)[1]=="try-error"){
var.sfeats1 <- mlr::getFeatSelResult(var.mod1)
} else {
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![Build Status](https://travis-ci.org/Envirometrix/landmap.svg?branch=master)](https://travis-ci.org/Envirometrix/landmap)
[![R-CMD-check](https://github.com/Envirometrix/landmap/workflows/R-CMD-check/badge.svg)](https://github.com/Envirometrix/landmap/actions)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/landmap)](https://cran.r-project.org/package=landmap)
[![Github_Status_Badge](https://img.shields.io/badge/Github-0.0--3-blue.svg)](https://github.com/Envirometrix/landmap)
[![Github_Status_Badge](https://img.shields.io/badge/Github-0.0--8-blue.svg)](https://github.com/Envirometrix/landmap)

Package provides methodology for automated mapping i.e. spatial interpolation and/or
prediction using **Ensemble Machine Learning** (extends functionality of the [mlr package](https://mlr.mlr-org.com/)). Key functionality includes:
Expand All @@ -27,7 +27,7 @@ is explained in detail in:
- Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., and Gräler, B. (2018).
[Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables](https://doi.org/10.7717/peerj.5518). PeerJ 6:e5518.

Use of geographical distances as features in machine learning is also explained in detail in:
Use of geographical distances and nearest neighbors as features in machine learning is also explained in detail in:

- Møller, A. B., Beucher, A. M., Pouladi, N., and Greve, M. H. (2020). [Oblique geographic coordinates as covariates for digital soil mapping](https://doi.org/10.5194/soil-6-269-2020). SOIL, 6, 269–289, https://doi.org/10.5194/soil-6-269-2020
- Sekulić, A., Kilibarda, M., Heuvelink, G.B., Nikolić, M., Bajat, B. (2020). [Random Forest Spatial Interpolation](https://doi.org/10.3390/rs12101687). Remote Sens. 12, 1687. https://doi.org/10.3390/rs12101687
Expand Down Expand Up @@ -176,7 +176,7 @@ uncertainty around a single predicted value. It can be derived as:
- standard deviation (assumes symmetric distribution of errors),

As a default value for prediction intervals, landmap uses `quantiles = c((1-.682)/2, 1-(1-.682)/2)` so that s.d. can also
be derived from the upper and lower 68% quantile using:
be derived from the upper and lower 68% quantiles by using:

```r
pred.error <- (q.upr-q.lwr)/2
Expand Down
3 changes: 3 additions & 0 deletions man/predict.spLearner.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6531ff2

Please sign in to comment.