R: Compute K-fold cross-validation predicted values from a...

trophicSDM_CV {webSDM}

R Documentation

Compute K-fold cross-validation predicted values from a fitted trophicSDM model

Description

Once the CV predicted values are obtained, their quality can be evaluated with evaluateModelFit().

Usage

trophicSDM_CV(
  tSDM,
  K,
  partition = NULL,
  prob.cov = FALSE,
  pred_samples = NULL,
  iter = NULL,
  chains = NULL,
  run.parallel = FALSE,
  verbose = FALSE
)

Arguments

`tSDM`	A trophicSDMfit object obtained with trophicSDM()
`K`	The number of folds for the K-fold cross validation
`partition`	Optional parameter. A partition vector to specify a partition in K fold for cross validation
`prob.cov`	Parameter to predict with trophicSDM with presence-absence data. Whether to use predicted probability of presence (prob.cov = T) or the transformed presence-absences (default, prov.cov = F) to predict species distribution.
`pred_samples`	Number of samples to draw from species posterior predictive distribution when method = "stan_glm". If NULL, set by the default to the number of iterations/10.
`iter`	For method = "stan_glm": number of iterations of each MCMC chains to fit the trophicSDM model. Default to the number of iterations used to fit the provided trophicSDMfit object
`chains`	For method = "stan_glm": number of MCMC chains to fit the trophicSDM model. Default to the number of iterations used to fit the provided trophicSDMfit object
`run.parallel`	Whether to use parallelise code when possible. Default to TRUE. Can speed up computation time
`verbose`	Whether to print advances of the algorithm

Value

A list containing:

`meanPred`	a sites x species matrix of predicted occurrences of species for each site (e.g. probability of presence). With stan_glm the posterior predictive mean is return
`Pred975`, `Pred025`	Only for method = "stan_glm", the 97.5% and 2.5% quantiles of the predictive posterior distribution
`partition`	the partition vector used to compute the K fold cross-validation

Author(s)

Giovanni Poggiato

Examples

data(Y, X, G)
# define abiotic part of the model
env.formula = "~ X_1 + X_2"
# Run the model with bottom-up control using glm as fitting method and no penalisation
# (set iter = 1000 to obtain reliable results)

m = trophicSDM(Y, X, G, env.formula, iter = 50, 
               family = binomial(link = "logit"), penal = NULL, 
               mode = "prey", method = "stan_glm")

# Run a 3-fold (K=3) cross validation. Predictions is done using presence-absences of preys
# (prob.cov = FALSE, see ?predict.trophicSDM) with 50 draws from the posterior distribution
# (pred_samples = 50)
CV = trophicSDM_CV(m, K = 3, prob.cov = FALSE, pred_samples = 10, run.parallel = FALSE)
# Use predicted values to evaluate model goodness of fit in cross validation
Ypred = CV$meanPred[,colnames(Y)]

evaluateModelFit(m, Ynew = Y, Ypredicted = Ypred)

# Now with K = 2 and by specifying the partition of site
m = trophicSDM(Y, X, G, env.formula, iter = 50,
               family = binomial(link = "logit"), penal = NULL, 
               mode = "prey", method = "glm")
partition = c(rep(1,500),rep(2,500))
CV = trophicSDM_CV(m, K = 2, partition = partition, prob.cov = FALSE,
                   pred_samples = 10, run.parallel = FALSE)
Ypred = CV$meanPred[,colnames(Y)]
evaluateModelFit(m, Ynew = Y, Ypredicted = Ypred)

[Package webSDM version 1.1-5 Index]