extend.cord {ecopower} | R Documentation |
Simulate or extend multivariate abundance data
Description
extend
returns a simulated response matrix or a manyglm
object with N
observations
and simulated response matrix that utilises the existing correlation structure of the data.
Usage
## S3 method for class 'cord'
extend(
object,
N = nrow(object$obj$data),
coeffs = coef(object$obj),
newdata = NULL,
n_replicate = NULL,
do.fit = FALSE,
seed = NULL
)
extend(
object,
N = nrow(object$obj$data),
coeffs = coef(object$obj),
newdata = NULL,
n_replicate = NULL,
do.fit = FALSE,
seed = NULL
)
Arguments
object |
objects of class |
N |
Number of samples to be extended. Defaults to the number of observations in the original sample. |
coeffs |
Coefficient matrix for a |
newdata |
Data frame of same size as the original X covariates from the fitted |
n_replicate |
Number of unique replicates of the original data frame. Defaults to |
do.fit |
Logical. If |
seed |
Random number seed, defaults to a random seed number. |
Details
extend
takes a cord
object and returns a new simulated response matrix or an "extended" manyglm
object
with N
observations and the new simulated response matrix. Response abundances are simulated through a Gaussian
copula model that utilises a coefficient matrix coeffs
, the specified cord
model and the joint
correlation structure exhibited between the response variables. To help with the specification of
coeffs
, see effect_alt
which simplifies this process.
Response variables are simulated through a copula model by first extracting Gaussian copular scores
as Dunn-Smyth residuals (Dunn & Smyth 1996), which are obtained from abundances y_{ij}
with marginal distributions
F_j
which have been specified via the original manyglm
model (fit.glm
; see examples);
z_{ij} = \Phi^{-1}{F_{j}(y_{ij}^-) + u_{ij} f_{j}(y_{ij})}
These scores then follow a multivariate Gaussian distribution with zero mean and covariance structure \Sigma
,
z_{ij} \sim N_p(0,\Sigma)
To avoid estimating a large number p(p-1)/2
pairwise correlations within \Sigma
, factor analysis is utilised
with two latent factor variables, which can be interpreted as an unobserved environmental covariate.
Thus, in order to simulate new multivariate abundances we simulate new copula scores and back transform them to
abundances as y_{ij}= {F^*}_j^{-1}(\Phi(z_{ij}))
, where the coefficient matrix coeffs
specifies the
effect size within the new marginal distributions {F^*}_j
.
The data frame is also extended in a manner that preserves the original design structure. This is done by first
repeating the design matrix until the number of samples exceeds N
, then randomly removing rows from the last
repeated data frame until the number of samples equals N
. Alternatively, a balanced design structure can be
obtained by specifying the number of replicates.
newdata
can be utilised if a different data frame is wanted for simulation.
If users are interested in obtaining a manyglm
model, do.fit=TRUE
can be used to obtain a manyglm
object from the simulated responses.
Value
Simulated data or manyglm
object.
Functions
-
extend()
: Simulate or extend multivariate abundance data
References
Dunn, P.K., & Smyth, G.K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 236-244.
See Also
Examples
library(ecoCopula)
library(mvabund)
data(spider)
spiddat = mvabund(spider$abund)
X = data.frame(spider$x)
# Specify increasers and decreasers
increasers = c("Alopacce", "Arctlute", "Arctperi", "Pardnigr", "Pardpull")
decreasers = c("Alopcune", "Alopfabr", "Zoraspin")
# Simulate data
fit.glm = manyglm(spiddat~1, family="negative.binomial")
fit.cord = cord(fit.glm)
simData = extend(fit.cord)
# Simulate data with N=20
fit.glm = manyglm(spiddat~soil.dry, family="negative.binomial", data=X)
fit.cord = cord(fit.glm)
simData = extend(fit.cord, N=20)
# Obtain a manyglm fit from simulated data with N=10 and effect_size=1.5
X$Treatment = rep(c("A","B","C","D"),each=7)
fit_factors.glm = manyglm(spiddat~Treatment, family="negative.binomial", data=X)
effect_mat = effect_alt(fit_factors.glm, effect_size=1.5,
increasers, decreasers, term="Treatment")
fit_factors.cord = cord(fit_factors.glm)
newFit.glm = extend(fit_factors.cord, N=10,
coeffs=effect_mat, do.fit=TRUE)
# Change sampling design
X_new = X
X_new$Treatment[6:7] = c("B","B")
simData = extend(fit_factors.cord, N=NULL,
coeffs=effect_mat, newdata=X_new, n_replicate=5)