cv_MRF_diag {MRFcov} | R Documentation |
MRF cross validation and assessment of predictive performance
Description
cv_MRF_diag
runs cross validation of MRFcov
models and tests predictive
performance.
cv_MRF_diag_rep
fits a single node-optimised model
and test's this model's predictive performance across multiple test subsets of the data
.
cv_MRF_diag_rep_spatial
fits a single node-optimised spatial model
and test's this model's predictive performance across multiple test subsets of the data
.
All cv_MRF
functions assess model predictive performance and produce
either diagnostic plots or matrices of predictive metrics.
Usage
cv_MRF_diag(
data,
symmetrise,
n_nodes,
n_cores,
sample_seed,
n_folds,
n_fold_runs,
n_covariates,
compare_null,
family,
plot = TRUE,
cached_model,
cached_predictions,
mod_labels = NULL
)
cv_MRF_diag_rep(
data,
symmetrise,
n_nodes,
n_cores,
sample_seed,
n_folds,
n_fold_runs,
n_covariates,
compare_null,
family,
plot = TRUE
)
cv_MRF_diag_rep_spatial(
data,
coords,
symmetrise,
n_nodes,
n_cores,
sample_seed,
n_folds,
n_fold_runs,
n_covariates,
compare_null,
family,
plot = TRUE
)
Arguments
data |
Dataframe. The input data where the |
symmetrise |
The method to use for symmetrising corresponding parameter estimates
(which are taken from separate regressions). Options are |
n_nodes |
Positive integer. The index of the last column in |
n_cores |
Positive integer. The number of cores to spread the job across using
|
sample_seed |
Numeric. This seed will be used as the basis for dividing data into folds. Default is a random seed between 1 and 100000 |
n_folds |
Integer. The number of folds for cross-validation. Default is 10 |
n_fold_runs |
Integer. The number of total training runs to perform. During
each run, the data will be split into |
n_covariates |
Positive integer. The number of covariates in |
compare_null |
Logical. If |
family |
The response type. Responses can be quantitative continuous ( |
plot |
Logical. If |
cached_model |
Used by function |
cached_predictions |
Used by function |
mod_labels |
Optional character string of labels for the two models being compared
(if |
coords |
A two-column |
Details
Node-optimised models are fitted using cv.glmnet
,
and these models is used to predict data
test subsets.
Test and training data
subsets are created using createFolds
.
To account for uncertainty in parameter estimates and in random fold generation, it is recommended
to perform cross-validation multiple times (by controlling the n_fold_runs
argument) using
cv_MRF_diag_rep
to supply a single cached model and that model's predictions.
This is useful for optimising a single model (using cv.glmnet
) and testing
this model's predictive performance across many test subsets. Alternatively, one can run
cv_MRF_diag
many times to fit different models in each iteration. This will be slower but
technically more sound
Value
If plot = TRUE
, a ggplot2
object is returned. This will be
a plot containing boxplots of predictive metrics across test sets using the
optimised model (see cv.glmnet
for further details of lambda1
optimisation). If plot = FALSE
, a matrix of prediction metrics is returned.
References
Clark, NJ, Wells, K and Lindberg, O. Unravelling changing interspecific interactions across environmental gradients using Markov random fields. (2018). Ecology doi: 10.1002/ecy.2221 Full text here.
See Also
MRFcov
,
predict_MRF
,
cv.glmnet
Examples
data("Bird.parasites")
# Generate boxplots of model predictive metrics
cv_MRF_diag(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial')
# Generate boxplots comparing the CRF to an MRF model (no covariates)
cv_MRF_diag(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial',
compare_null = TRUE)
# Replicate 10-fold cross-validation 10 times
cv.preds <- cv_MRF_diag_rep(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial',
compare_null = TRUE,
plot = FALSE, n_fold_runs = 10)
# Plot model sensitivity and % true predictions
library(ggplot2)
gridExtra::grid.arrange(
ggplot(data = cv.preds, aes(y = mean_sensitivity, x = model)) +
geom_boxplot() + theme(axis.text.x = ggplot2::element_blank()) +
labs(x = ''),
ggplot(data = cv.preds, aes(y = mean_tot_pred, x = model)) +
geom_boxplot(),
ncol = 1,
heights = c(1, 1))
# Create some sample Poisson data with strong correlations
cov <- rnorm(500, 0.2)
cov2 <- rnorm(500, 1)
sp.2 <- rpois(500, lambda = exp(1.5 + (cov * 0.9)))
poiss.dat <- data.frame(sp.1 = rpois(500, lambda = exp(0.5 + (cov * 0.3))),
sp.2 = sp.2,
sp.3 = rpois(500, lambda = exp(log(sp.2 + 1) + (cov * -0.5))),
cov = cov,
cov2 = cov2)
# A CRF should produce a better fit (lower deviance, lower MSE)
cvMRF.poiss <- cv_MRF_diag(data = poiss.dat, n_nodes = 3,
n_folds = 10,
family = 'poisson',
compare_null = TRUE, plot = TRUE)