spatPredict {SAiVE} | R Documentation |
Predict spatial variables using machine learning
Description
Function to facilitate the prediction of spatial variables using machine learning, including the selection of a particular model and/or model parameters from several user-defined options. Both classification and regression is supported, though please ensure that the models passed to the parameter methods
are suitable.
Note that you may need to acquiesce to installing supplementary packages, depending on the model types chosen and whether or not these have been run before; this function may not be 'set and forget'.
It is possible to specify multiple machine learning methods (the methods
parameter) as well as method-specific parameters (the trainControl
parameter) if you wish to test multiple options and select the best one. To facilitate method selection, refer to function modelMatch()
. If you are unsure of the best model to use, you can use the fastCompare
parameter to quickly compare models and select the best one based on accuracy. If you wish to use a single model and/or trainControl object, you can pass a single string to methods
and a single trainControl object to trainControl
.
Warning options are changed for this function only to show all warnings as they occur and reset back to their original state upon function completion (a test is done first to ensure it can be reset). This is to ensure that any warnings when running models are shown in sequence with the messages indicating the progress of the function, especially when running multiple models and/or trainControl options.
Usage
spatPredict(
features,
outcome,
poly_sample = 1000,
trainControl,
methods,
fastCompare = TRUE,
fastFraction = NULL,
thinFeatures = TRUE,
predict = FALSE,
n.cores = NULL,
save_path = NULL
)
Arguments
features |
Independent variables. Must be either a NAMED list of terra spatRasters or a multi-layer (stacked) spatRaster (c(rast1, rast2). All layers must all have the same cell size, alignment, extent, and crs. These rasters should include the training extent (that covered by the spatVector in |
outcome |
Dependent variable, as a terra spatVector of points or polygons with a single attribute table column (of class integer, numeric or factor). The class of this column dictates whether the problem is approached as a classification or regression problem; see details. If specifying polygons, stratified random sampling will be done with |
poly_sample |
If passing a polygon SpatVector to |
trainControl |
Parameters used to control training of the machine learning model, created with |
methods |
A string specifying one or more classification/regression methods(s) to use. Passed to the |
fastCompare |
If specifying multiple methods in |
fastFraction |
The fraction of points to use for the method comparison step (final training and testing is always done on the full data set) if |
thinFeatures |
Should random forest selection using |
predict |
TRUE will apply the trained model to the full extent of |
n.cores |
The maximum number of cores to use. Leave NULL to use all cores minus 1. |
save_path |
The path (folder) to which you wish to save the predicted raster. Not used unless |
Details
This function partly operates as a convenient means of passing various parameters to the caret::train()
function, enabling the user to rapidly trial different model types and parameter sets. In addition, pre-processing of data can optionally be done using VSURF::VSURF()
(parameter thinFeatures
) which can decrease the time to run models by removing superfluous parameters.
Value
If passing only one method to the method
argument: the outcome of the VSURF variable selection process (if thinFeatures
is TRUE), the training and testing data.frames, the fitted model, model performance statistics, and the final predicted raster (if predict
= TRUE).
If passing multiple methods to the method
argument: the outcome of the VSURF variable selection process (if thinFeatures
is TRUE), the training and testing data.frames, character vectors for failed methods, methods which generated a warning, and what those errors and warnings were, model performance comparison (if methods includes more than one method), the selected method, the trained model performance statistics, and the final predicted raster (if predict
= TRUE).
In either case, the predicted raster is written to disk if save_path
is specified.
Model testing, comparison, and reported metrics
After extracting raster values at n points from the features
rasters the point values are split spatially into training and testing sets along a 70/30 split. This is accomplished by creating a grid (1000*1000) of polygons over the extent of the points and randomly assigning polygons to training or testing sets. Points within these polygons are then assigned to the corresponding set, ensuring that the training and testing sets are spatially independent.
Method for selecting the best model:
When specifying multiple model types inmethods
, each model type and trainControl
pair (if trainControl
is a list of length equal to methods
) is run using caret::train()
. To speed things up you can use fastCompare
= TRUE. Models are then compared on their 'accuracy' metric as output by caret::resamples()
when run on the testing partition, and the highest-performing model is selected. If fastCompare
is TRUE, this model is then run on the complete data set provided in outcome
. Model statistics are returned upon function completion, which allows the user to select their own 'best performing' model based on other criteriaif desired.
Balancing classes in outcome (dependent) variable
Models can be biased if they are given significantly more points in one outcome class vs others, and best practice is to even out the number of points in each class. If extracting point values from a vector or raster object and passing a points vector object to this function, a simple way to do that is by using the "strata" parameter if using terra::spatSample()
. If working directly from points, caret::downSample()
and caret::upSample()
can be used. See this link for more information. Note that if passing a polygons object to this function stratified random sampling will automatically be performed.
Classification or regression
Whether this function treats your inputs as a classification or regression problem depends on the class attached to the outcome variable. A class factor
will be treated as a classification problem while all other classes will be treated as regression problems.
Author(s)
Ghislain de Laplante (gdela069@uottawa.ca or ghislain.delaplante@yukon.ca)
Examples
# These examples can take a while to run!
# Install packages underpinning examples
rlang::check_installed("ranger", reason = "required to run example.")
rlang::check_installed("Rborist", reason = "required to run example.")
# Single model, single trainControl
trainControl <- caret::trainControl(
method = "repeatedcv",
number = 2, # 2-fold Cross-validation
repeats = 2, # repeated 2 times
verboseIter = FALSE,
returnResamp = "final",
savePredictions = "all",
allowParallel = TRUE)
outcome <- permafrost_polygons
outcome$Type <- as.factor(outcome$Type)
result <- spatPredict(features = c(aspect, solrad, slope),
outcome = outcome,
poly_sample = 100,
trainControl = trainControl,
methods = "ranger",
n.cores = 2,
predict = TRUE)
terra::plot(result$prediction)
# Multiple models, multiple trainControl
trainControl <- list("ranger" = caret::trainControl(
method = "repeatedcv",
number = 2,
repeats = 2,
verboseIter = FALSE,
returnResamp = "final",
savePredictions = "all",
allowParallel = TRUE),
"Rborist" = caret::trainControl(
method = "boot",
number = 2,
repeats = 2,
verboseIter = FALSE,
returnResamp = "final",
savePredictions = "all",
allowParallel = TRUE)
)
result <- spatPredict(features = c(aspect, solrad, slope),
outcome = outcome,
poly_sample = 100,
trainControl = trainControl,
methods = c("ranger", "Rborist"),
n.cores = 2,
predict = TRUE)
terra::plot(result$prediction)