R: Compares models via spatial cross-validation

rf_compare {spatialRF}

R Documentation

Compares models via spatial cross-validation

Description

Uses rf_evaluate() to compare the performance of several models on independent spatial folds via spatial cross-validation.

Usage

rf_compare(
  models = NULL,
  xy = NULL,
  repetitions = 30,
  training.fraction = 0.75,
  metrics = c("r.squared", "pseudo.r.squared", "rmse", "nrmse", "auc"),
  distance.step = NULL,
  distance.step.x = NULL,
  distance.step.y = NULL,
  fill.color = viridis::viridis(100, option = "F", direction = -1, alpha = 0.8),
  line.color = "gray30",
  seed = 1,
  verbose = TRUE,
  n.cores = parallel::detectCores() - 1,
  cluster = NULL
)

Arguments

`models`	Named list with models resulting from `rf()`, `rf_spatial()`, `rf_tuning()`, or `rf_evaluate()`. Example: `models = list(a = model.a, b = model.b)`. Default: `NULL`
`xy`	Data frame or matrix with two columns containing coordinates and named "x" and "y". Default: `NULL`
`repetitions`	Integer, number of spatial folds to use during cross-validation. Must be lower than the total number of rows available in the model's data. Default: `30`
`training.fraction`	Proportion between 0.5 and 0.9 indicating the proportion of records to be used as training set during spatial cross-validation. Default: `0.75`
`metrics`	Character vector, names of the performance metrics selected. The possible values are: "r.squared" (`cor(obs, pred) ^ 2`), "pseudo.r.squared" (`cor(obs, pred)`), "rmse" (`sqrt(sum((obs - pred)^2)/length(obs))`), "nrmse" (`rmse/(quantile(obs, 0.75) - quantile(obs, 0.25))`). Default: `c("r.squared", "pseudo.r.squared", "rmse", "nrmse")`
`distance.step`	Numeric, argument `distance.step` of `thinning_til_n()`. distance step used during the selection of the centers of the training folds. These fold centers are selected by thinning the data until a number of folds equal or lower than `repetitions` is reached. Its default value is 1/1000th the maximum distance within records in `xy`. Reduce it if the number of training folds is lower than expected.
`distance.step.x`	Numeric, argument `distance.step.x` of `make_spatial_folds()`. Distance step used during the growth in the x axis of the buffers defining the training folds. Default: `NULL` (1/1000th the range of the x coordinates).
`distance.step.y`	Numeric, argument `distance.step.x` of `make_spatial_folds()`. Distance step used during the growth in the y axis of the buffers defining the training folds. Default: `NULL` (1/1000th the range of the y coordinates).
`fill.color`	Character vector with hexadecimal codes (e.g. "#440154FF" "#21908CFF" "#FDE725FF"), or function generating a palette (e.g. `viridis::viridis(100)`). Default: `viridis::viridis(100, option = "F", direction = -1)`
`line.color`	Character string, color of the line produced by `ggplot2::geom_smooth()`. Default: `"gray30"`
`seed`	Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default: `1`.
`verbose`	Logical. If `TRUE`, messages and plots generated during the execution of the function are displayed, Default: `TRUE`
`n.cores`	Integer, number of cores to use for parallel execution. Creates a socket cluster with `parallel::makeCluster()`, runs operations in parallel with `foreach` and `⁠%dopar%⁠`, and stops the cluster with `parallel::clusterStop()` when the job is done. Default: `parallel::detectCores() - 1`
`cluster`	A cluster definition generated with `parallel::makeCluster()`. If provided, overrides `n.cores`. When `cluster = NULL` (default value), and `model` is provided, the cluster in `model`, if any, is used instead. If this cluster is `NULL`, then the function uses `n.cores` instead. The function does not stop a provided cluster, so it should be stopped with `parallel::stopCluster()` afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the `model` argument, or using the `⁠%>%⁠` pipe. Default: `NULL`

Value

A list with three slots:

comparison.df: Data frame with one performance value per spatial fold, metric, and model.
spatial.folds: List with the indices of the training and testing records for each evaluation repetition.
plot: Violin-plot of comparison.df.

Examples

if(interactive()){

 #loading example data
 data(distance_matrix)
 data(plant_richness_df)

 #fitting random forest model
 rf.model <- rf(
   data = plant_richness_df,
   dependent.variable.name = "richness_species_vascular",
   predictor.variable.names = colnames(plant_richness_df)[5:21],
   distance.matrix = distance_matrix,
   distance.thresholds = 0,
   n.cores = 1
 )

 #fitting a spatial model with Moran's Eigenvector Maps
 rf.spatial <- rf_spatial(
 model = rf.model,
 n.cores = 1
 )

 #comparing the spatial and non spatial models
 comparison <- rf_compare(
 models = list(
   `Non spatial` = rf.model,
   Spatial = rf.spatial
 ),
 xy = plant_richness_df[, c("x", "y")],
 metrics = c("r.squared", "rmse"),
 n.cores = 1
 )

}

[Package spatialRF version 1.1.4 Index]

Compares models via spatial cross-validation

Description

Usage

Arguments

Value

See Also

Examples