R: Fit and select an Explicit Interaction Community Model (EICM)

eicm {eicm}

R Documentation

Fit and select an Explicit Interaction Community Model (EICM)

Description

Given species occurrence data and (optionally) measured environmental predictors, fits and selects an EICM that models species occurrence probability as a function of measured predictors, unmeasured predictors (latent variables) and direct species interactions.

Usage

eicm(
  occurrences,
  env = NULL,
  traits = NULL,
  intercept = TRUE,
  n.latent = 0,
  rotate.latents = FALSE,
  scale.latents = TRUE,
  forbidden = NULL,
  allowed = NULL,
  mask.sp = NULL,
  exclude.prevalence = 0,
  regularization = c(ifelse(n.latent > 0, 6, 0.5), 1),
  regularization.type = "hybrid",
  penalty = 4,
  theta.threshold = 0.5,
  latent.lambda = 1,
  fit.all.with.latents = TRUE,
  popsize.sel = 2,
  n.cores = parallel::detectCores(),
  parallel = FALSE,
  true.model = NULL,
  do.selection = TRUE,
  do.plots = TRUE,
  fast = FALSE,
  refit.selected = TRUE
)

Arguments

`occurrences`	a binary (0/1) sample x species matrix, possibly including NAs.
`env`	an optional sample x environmental variable matrix, for the known environmental predictors.
`traits`	an optional species x trait matrix. Currently, it is only used for excluding species interactions a priori.
`intercept`	logical specifying whether to add a column for the species-level intercepts.
`n.latent`	the number of latent variables to estimate.
`rotate.latents`	logical. Rotate the estimated latent variable values (the values of the latents at each sample) in the first step with PCA? Defaults to FALSE.
`scale.latents`	logical. Standardize the estimated latent variable values (the values of the latents at each sample) in the first step? Defaults to TRUE.
`forbidden`	a formula (or list of) defining which species interactions are not to be estimated. See details. This constraint is cumulative with other constraints (`mask.sp` and `exclude.prevalence`).
`allowed`	a formula (or list of) defining which species interactions are to be estimated. See details. This constraint is cumulative with other constraints (`mask.sp` and `exclude.prevalence`).
`mask.sp`	a scalar or a binary square species x species matrix defining which species interactions to exclude (0) or include (1) a priori. If a scalar (0 or 1), 0 excludes all interactions, 1 allows all interactions. If a matrix, species in the columns affect species in the rows, so, setting `mask.sp[3, 8] <- 0` means that species #8 is assumed a priori to not affect species #3. This constraint is cumulative with other constraints (`forbidden` and `exclude.prevalence`).
`exclude.prevalence`	exclude species interactions which are caused by species with prevalence equal or lower than this value. This constraint is cumulative with other constraints (`forbidden` and `mask.sp`)
`regularization`	a two-element numeric vector defining the regularization lambdas used for environmental coefficients and for species interactions respectively. See details.
`regularization.type`	one of "lasso", "ridge" or "hybrid", defining the type of penalty to apply. Type "hybrid" applies ridge penalty to environmental coefficients and LASSO to interaction coefficients.
`penalty`	the penalty applied to the number of species interactions to include, during variable selection.
`theta.threshold`	exclude species interactions (from network selection) whose preliminary coefficient (in absolute value) is lower than this value. This exclusion criterion is cumulative with the other user-defined exclusions.
`latent.lambda`	the regularization applied to latent variables and respective coefficients when estimating their values in samples.
`fit.all.with.latents`	logical. Whether to use the previously estimated latent variables when estimating the preliminary species interactions.
`popsize.sel`	the population size for the genetic algorithm, expressed as the factor to multiply by the recommended minimum. Ignored if `do.selection=FALSE`.
`n.cores`	the number of CPU cores to use in the variable selection stage and in the optimization.
`parallel`	logical. Whether to use `optimParallel` during optimizations instead of `optim`.
`true.model`	for validation purposes only: the true model that has generated the data, to which the estimated coefficients will be compared in each selection algorithm iteration.
`do.selection`	logical. Conduct the variable selection stage, over species interaction network topology?
`do.plots`	logical. Plot diagnostic and trace plots?
`fast`	a logical defining whether to do a fast - but less accurate - estimation, or a normal estimation.
`refit.selected`	logical. Refit with exact estimates the best model after network selection? Note that, for performance reasons, the models fit during the network selection stage use an approximate likelihood.

Details

An Explicit Interaction Community Model (EICM) is a simultaneous equation linear model in which each species model integrates all the other species as predictors, along with measured and latent variables.

This is the main function for fitting EICM models, and is preferred over using eicm.fit directly.

This function conducts the fitting and network topology selection workflow, which includes three stages: 1) estimate latent variable values; 2) make preliminary estimates for species interactions; 3) conduct network topology selection over a reduced model (based on the preliminary estimates).

The selection stage is optional. If not conducted, the species interactions are estimated (all or a subset according to the user-provided constraints), but not selected. See vignette("eicm") for commented examples on a priori excluding interactions.

Missing data in the response matrix is allowed.

Value

A eicm.list with the following components:

true.model:: a copy of the true.model argument.
latents.only:: the model with only the latent variables estimated.
fitted.model: the model with only the species interactions estimated.
selected.model:: the final model with all coefficients estimated, after network topology selection. This is the "best" model given the selection criterion (which depends on regularization and penalty.

When accessing the results, remember to pick the model you want (usually, selected.model). plot automatically picks selected.model or, if NULL, fitted.model.

Examples

# refer to the vignette for a more detailed explanation

# This can take some time to run

# Load the included parameterized model
data(truemodel)

# make one realization of the model
occurrences <- predict(truemodel, nrepetitions=1)

# Fit and select a model with 2 latent variables to be estimated and all
# interactions possible
m <- eicm(occurrences, n.latent=2, penalty=4, theta.threshold=0.5, n.cores=2)

plot(m)

[Package eicm version 1.0.3 Index]