corrFamily-design {spaMM} | R Documentation |
Designing new corrFamily descriptors for parametric correlation families
Description
This documentation describe additional design features to be taken into account when defining a new corrFamily
descriptor for a correlation model. Using such a descriptor will be more efficient than the equally general method, of maximizing an objective function of the correlation parameters that calls (say) fitme()
on a model including a corrMatrix
itself function of the correlation parameters. But this may still be inefficient if a few issues are ignored.
For elements of the corrFamily descriptor for basic cases:
- Cf
The function value should (1) be of constant class for all parameter values. For families of mathematically sparse matrices, the
CsparseMatrix
class is recommended (and more specifically thedsCMatrix
class since the matrix is symmetric); (2) have row names that match the levels of the grouping factor (the nested random effect Example shows the code needed when this nested effect is defined from two variables).- tpar
In order to favor the automatic selection of suitable algorithms,
tpar
should be chosen so thatCf(tpar)
is least sparse (i.e., has the minimal number of elements equal to zero) in the correlation family, in terms of its sparsity and of the sparsity of its inverse. Atpar
yielding an identity matrix is often a *bad* template as least sparse correlation matrices and their inverses are denser for most families except diagonal ones. For degerate corrFamily objects that describe a constant correlation model rather than a parametric family, usetpar=numeric(0)
.- type
Do not forget
type="precision"
it if the return value ofCf
is an inverse correlation matrix rather than a correlation matrix, in which case one should specify .- calc_moreargs
should have formal arguments including at least
corrfamily
and...
. The source code ofARp
,ARMA
ordiallel
shows the expected structure of its return value.
For advanced features of the corrFamily descriptor:
- Af
Af
has (minimally) three formal arguments(newdata, term, ...)
. spaMM will callAf
with distinct values of thenewdata
argument for the fit, and for predictions for new data. For the curious: theterm
argument that will be provided by spaMM toAf
is the formula term for the random effect – an object of classcall
, as obtained e.g. by
( ~ 1+ corrFamily(1 | longitude + latitude))[[2]][[3]]
–, which will provide the names of the variables that need to be taken from thenewdata
to construct the matrix returned byAf
.
Details
-
spaMM will regularize invalid or nearly-singular correlation or covariance matrices internally if the correlation function has not done so already, but it it better to control this in the correlation function. The
regularize
convenience function is available for that purpose, but parametrizations that avoid the need for regularization are even better, since fitting models with nearly-singular correlation matrices is prone to various difficulties (The Toeplitz example below is good to illustrate potential problems but is otherwise poor as it produces non-positive definite matrices; theARp
constructor illustrates a parametrization that avoids that problem). Users should make sure that any regularized matrix still belongs to the intended parametric family of matrices, and they should keep in mind that the spaMM output will show the input parameters of the unregularized matrix, not the parameters of the regularized one (e.g., in the Toeplitz example below, the fitted matrix is a regularized Toepliz matrix with slightly different coefficients than the input parameters).
And for efficiency,
Let us repeat that the correlation function should return matrices of constant class, and in sparse format when the matrices are indeed mathematically sparse. For mathematically dense matrices (as in the Toeplitz example below), the
dsyMatrix
class may be suitable.Let us repeat that in order to favor the automatic selection of suitable algorithms,
tpar
should be chosen so thatCf(tpar)
is least sparse in the correlation family. For matrices ofCsparseMatrix
, a check is implemented to catch wrong choices oftpar
.For challenging problems (large data, many parameters...) it may pay to optimize a bit the correlation function. The Example of nested effects with heterogenous variance below illustrates a possible trick. In the same cases, It may also pay to try the alternative
algebra
ic methods, by first comparing speed of the different methods (control.HLfit=list(algebra= <"spprec"|"spcorr"|"decorr">)
) for given correlation parameter values, rather than to assume that spaMM will find the best method (even if it often does so).The corrFamily descriptor may optionally contain booleans
possiblyDenseCorr
andsparsePrec
to help spaMM select the most appropriate matrix algebraic methods.sparsePrec
should be set to TRUE if sparse-precision methods are expected to be efficient for fitting the random effect.possiblyDenseCorr
should be set to FALSE if the correlation matrix is expected to be sparse, which means here that less than 15% of its elements are non-zero.
Examples
if (spaMM.getOption("example_maxtime")>2 &&
requireNamespace("agridat", quietly = TRUE)) {
data("onofri.winterwheat", package="agridat")
##### Fitting a Toeplitz correlation model for temporal correlations #####
# A Toeplitz correlation matrix of dimension d*d has d-1 parameters
# (by symmetry, and with 1s on the main diagonal). These d-1 parameters
# can be fitted as follows:
Toepfn <- function(v) {
toepmat <- Matrix::forceSymmetric(toeplitz(c(1,v))) # dsyMatrix
# Many of the matrices in this family are not valid correlation matrices;
# the regularize() function is handy here:
toepmat <- regularize(toepmat, maxcondnum=1e12)
# And don't forget the rownames!
rownames(toepmat) <- unique(onofri.winterwheat$year)
toepmat
}
(Toepfit <- spaMM::fitme(
yield ~ gen + corrFamily(1|year), data=onofri.winterwheat, method="REML",
covStruct=list(corrFamily=list(Cf=Toepfn, tpar=rep(1e-4,6))),
# (Note the gentle warning if one instead uses tpar=rep(0,6) here)
lower=list(corrPars=list("1"=rep(-0.999,6))),
upper=list(corrPars=list("1"=rep(0.999,6)))))
# The fitted matrix is (nearly) singular, and was regularized:
eigen(Corr(Toepfit)[[1]])$values
# which means that the returned likelihood may be inaccurate,
# and also that the actual matrix elements differ from input parameters:
Corr(Toepfit)[[1]][1,-1]
### The usual rules for specifying covStruct, 'lower', 'upper' and 'init' apply
# here when the corrFamily term is the second random-effect:
(Toep2 <- spaMM::fitme(
yield ~ 1 + (1|gen) + corrFamily(1|year), data=onofri.winterwheat, method="REML",
covStruct=list("1"=NULL, corrFamily=list(Cf=Toepfn, tpar=rep(1e-4,6))),
, init=list(corrPars=list("2"=rep(0.1,6))),
lower=list(corrPars=list("2"=rep(-0.999,6))),
upper=list(corrPars=list("2"=rep(0.999,6)))))
##### Fitting one variance among years per each of 8 genotypes. #####
# First, note that this can be *more efficiently* fitted by another syntax:
### Fit as a constrained random-coefficient model:
# Diagonal matrix of NA's, represented as vector for its lower triangle:
ranCoefs_for_diag <- function(nlevels) {
vec <- rep(0,nlevels*(nlevels+1L)/2L)
vec[cumsum(c(1L,rev(seq(nlevels-1L)+1L)))] <- NA
vec
}
(by_rC <- spaMM::fitme(yield ~ 1 + (0+gen|year), data=onofri.winterwheat, method="REML",
fixed=list(ranCoefs=list("1"=ranCoefs_for_diag(8)))))
### Fit as a corrFamily model:
gy_levels <- paste0(gl(8,1,length =56,labels=levels(onofri.winterwheat$gen)),":",
gl(7,8,labels=unique(onofri.winterwheat$year)))
# A log scale is often suitable for variances, hence is used below;
# a correct but crude implementation of the model is
diagf <- function(logvar) {
corr_map <- kronecker(Matrix::.symDiagonal(n=7),diag(x=exp(logvar)))
rownames(corr_map) <- gy_levels
corr_map
}
# but we can minimize matrix operations as follows:
corr_map <- Matrix::.symDiagonal(n=8,x=seq(8))
rownames(corr_map) <- unique(onofri.winterwheat$gen)
diagf <- function(logvar) {
corr_map@x <- exp(logvar)[corr_map@x]
corr_map
} # (and this returns a dsCMatrix)
(by_cF <- spaMM::fitme(
yield ~ 1 + corrFamily(1|gen %in% year), data=onofri.winterwheat, method="REML",
covStruct=list(corrFamily = list(Cf=diagf, tpar=rep(1,8))),
fixed=list(lambda=1), # Don't forget to fix this redundant parameter!
# init=list(corrPars=list("1"=rep(log(O.1),8))), # 'init' optional
lower=list(corrPars=list("1"=rep(log(1e-6),8))), # 'lower' and 'upper' required
upper=list(corrPars=list("1"=rep(log(1e6),8)))))
# => The 'gen' effect is nested in the 'year' effect and this must be specified in the
# right-hand side of corrFamily(1|gen %in% year) so that the design matrix 'Z' for the
# random effects to have the correct structure. And then, as for other correlation
# structures (say Matern) it should be necessary to specify only the correlation matrix
# for a given year, as done above. Should this fail, it is also possible to specify the
# correlation matrix over years, as done below. spaMM will automatically detect, from
# its size matching the number of columns of Z, that it must be the matrix over years.
corr_map <- Matrix::forceSymmetric(kronecker(Matrix::.symDiagonal(n=7),diag(x=seq(8))))
rownames(corr_map) <- gy_levels
diagf <- function(logvar) {
corr_map@x <- exp(logvar)[corr_map@x]
corr_map
} # (and this returns a dsCMatrix)
(by_cF <- spaMM::fitme(
yield ~ 1 + corrFamily(1|gen %in% year), data=onofri.winterwheat, method="REML",
covStruct=list(corrFamily = list(Cf=diagf, tpar=rep(1,8))),
fixed=list(lambda=1), # Don't forget to fix this redundant parameter!
# init=list(corrPars=list("1"=rep(log(O.1),8))), # 'init' optional
lower=list(corrPars=list("1"=rep(log(1e-6),8))), # 'lower' and 'upper' required
upper=list(corrPars=list("1"=rep(log(1e6),8)))))
exp(get_ranPars(by_cF)$corrPars[[1]]) # fitted variances
}