| cv_MRF_diag {MRFcov} | R Documentation |
MRF cross validation and assessment of predictive performance
Description
cv_MRF_diag runs cross validation of MRFcov models and tests predictive
performance.
cv_MRF_diag_rep fits a single node-optimised model
and test's this model's predictive performance across multiple test subsets of the data.
cv_MRF_diag_rep_spatial fits a single node-optimised spatial model
and test's this model's predictive performance across multiple test subsets of the data.
All cv_MRF functions assess model predictive performance and produce
either diagnostic plots or matrices of predictive metrics.
Usage
cv_MRF_diag(
data,
symmetrise,
n_nodes,
n_cores,
sample_seed,
n_folds,
n_fold_runs,
n_covariates,
compare_null,
family,
plot = TRUE,
cached_model,
cached_predictions,
mod_labels = NULL
)
cv_MRF_diag_rep(
data,
symmetrise,
n_nodes,
n_cores,
sample_seed,
n_folds,
n_fold_runs,
n_covariates,
compare_null,
family,
plot = TRUE
)
cv_MRF_diag_rep_spatial(
data,
coords,
symmetrise,
n_nodes,
n_cores,
sample_seed,
n_folds,
n_fold_runs,
n_covariates,
compare_null,
family,
plot = TRUE
)
Arguments
data |
Dataframe. The input data where the |
symmetrise |
The method to use for symmetrising corresponding parameter estimates
(which are taken from separate regressions). Options are |
n_nodes |
Positive integer. The index of the last column in |
n_cores |
Positive integer. The number of cores to spread the job across using
|
sample_seed |
Numeric. This seed will be used as the basis for dividing data into folds. Default is a random seed between 1 and 100000 |
n_folds |
Integer. The number of folds for cross-validation. Default is 10 |
n_fold_runs |
Integer. The number of total training runs to perform. During
each run, the data will be split into |
n_covariates |
Positive integer. The number of covariates in |
compare_null |
Logical. If |
family |
The response type. Responses can be quantitative continuous ( |
plot |
Logical. If |
cached_model |
Used by function |
cached_predictions |
Used by function |
mod_labels |
Optional character string of labels for the two models being compared
(if |
coords |
A two-column |
Details
Node-optimised models are fitted using cv.glmnet,
and these models is used to predict data test subsets.
Test and training data subsets are created using createFolds.
To account for uncertainty in parameter estimates and in random fold generation, it is recommended
to perform cross-validation multiple times (by controlling the n_fold_runs argument) using
cv_MRF_diag_rep to supply a single cached model and that model's predictions.
This is useful for optimising a single model (using cv.glmnet) and testing
this model's predictive performance across many test subsets. Alternatively, one can run
cv_MRF_diag many times to fit different models in each iteration. This will be slower but
technically more sound
Value
If plot = TRUE, a ggplot2 object is returned. This will be
a plot containing boxplots of predictive metrics across test sets using the
optimised model (see cv.glmnet for further details of lambda1
optimisation). If plot = FALSE, a matrix of prediction metrics is returned.
References
Clark, NJ, Wells, K and Lindberg, O. Unravelling changing interspecific interactions across environmental gradients using Markov random fields. (2018). Ecology doi: 10.1002/ecy.2221 Full text here.
See Also
MRFcov,
predict_MRF,
cv.glmnet
Examples
data("Bird.parasites")
# Generate boxplots of model predictive metrics
cv_MRF_diag(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial')
# Generate boxplots comparing the CRF to an MRF model (no covariates)
cv_MRF_diag(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial',
compare_null = TRUE)
# Replicate 10-fold cross-validation 10 times
cv.preds <- cv_MRF_diag_rep(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial',
compare_null = TRUE,
plot = FALSE, n_fold_runs = 10)
# Plot model sensitivity and % true predictions
library(ggplot2)
gridExtra::grid.arrange(
ggplot(data = cv.preds, aes(y = mean_sensitivity, x = model)) +
geom_boxplot() + theme(axis.text.x = ggplot2::element_blank()) +
labs(x = ''),
ggplot(data = cv.preds, aes(y = mean_tot_pred, x = model)) +
geom_boxplot(),
ncol = 1,
heights = c(1, 1))
# Create some sample Poisson data with strong correlations
cov <- rnorm(500, 0.2)
cov2 <- rnorm(500, 1)
sp.2 <- rpois(500, lambda = exp(1.5 + (cov * 0.9)))
poiss.dat <- data.frame(sp.1 = rpois(500, lambda = exp(0.5 + (cov * 0.3))),
sp.2 = sp.2,
sp.3 = rpois(500, lambda = exp(log(sp.2 + 1) + (cov * -0.5))),
cov = cov,
cov2 = cov2)
# A CRF should produce a better fit (lower deviance, lower MSE)
cvMRF.poiss <- cv_MRF_diag(data = poiss.dat, n_nodes = 3,
n_folds = 10,
family = 'poisson',
compare_null = TRUE, plot = TRUE)