corrFamily-design {spaMM}R Documentation

Designing new corrFamily descriptors for parametric correlation families

Description

This documentation describe additional design features to be taken into account when defining a new corrFamily descriptor for a correlation model. Using such a descriptor will be more efficient than the equally general method, of maximizing an objective function of the correlation parameters that calls (say) fitme() on a model including a corrMatrix itself function of the correlation parameters. But this may still be inefficient if a few issues are ignored.

For elements of the corrFamily descriptor for basic cases:

Cf

The function value should (1) be of constant class for all parameter values. For families of mathematically sparse matrices, the CsparseMatrix class is recommended (and more specifically the dsCMatrix class since the matrix is symmetric); (2) have row names that match the levels of the grouping factor (the nested random effect Example shows the code needed when this nested effect is defined from two variables).

tpar

In order to favor the automatic selection of suitable algorithms, tpar should be chosen so that Cf(tpar) is least sparse (i.e., has the minimal number of elements equal to zero) in the correlation family, in terms of its sparsity and of the sparsity of its inverse. A tpar yielding an identity matrix is often a *bad* template as least sparse correlation matrices and their inverses are denser for most families except diagonal ones. For degerate corrFamily objects that describe a constant correlation model rather than a parametric family, use tpar=numeric(0).

type

Do not forget type="precision" it if the return value of Cf is an inverse correlation matrix rather than a correlation matrix, in which case one should specify .

calc_moreargs

should have formal arguments including at least corrfamily and .... The source code of ARp, ARMA or diallel shows the expected structure of its return value.

For advanced features of the corrFamily descriptor:

Af

Af has (minimally) three formal arguments (newdata, term, ...). spaMM will call Af with distinct values of the newdata argument for the fit, and for predictions for new data. For the curious: the term argument that will be provided by spaMM to Af is the formula term for the random effect – an object of class call, as obtained e.g. by
( ~ 1+ corrFamily(1 | longitude + latitude))[[2]][[3]] –, which will provide the names of the variables that need to be taken from the newdata to construct the matrix returned by Af.

Details

Examples

if (spaMM.getOption("example_maxtime")>2 &&
      requireNamespace("agridat", quietly = TRUE)) {

data("onofri.winterwheat", package="agridat")

##### Fitting a Toeplitz correlation model for temporal correlations #####

# A Toeplitz correlation matrix of dimension d*d has d-1 parameters 
# (by symmetry, and with 1s on the main diagonal). These d-1 parameters 
# can be fitted as follows:

Toepfn <- function(v) {
  toepmat <- Matrix::forceSymmetric(toeplitz(c(1,v))) # dsyMatrix
  # Many of the matrices in this family are not valid correlation matrices;
  #   the regularize() function is handy here:
  toepmat <- regularize(toepmat, maxcondnum=1e12)
  # And don't forget the rownames!
  rownames(toepmat) <- unique(onofri.winterwheat$year)
  toepmat
}

(Toepfit <- spaMM::fitme(
  yield ~ gen + corrFamily(1|year), data=onofri.winterwheat, method="REML",
  covStruct=list(corrFamily=list(Cf=Toepfn, tpar=rep(1e-4,6))), 
         # (Note the gentle warning if one instead uses tpar=rep(0,6) here)  
  lower=list(corrPars=list("1"=rep(-0.999,6))), 
  upper=list(corrPars=list("1"=rep(0.999,6))))) 

# The fitted matrix is (nearly) singular, and was regularized:

eigen(Corr(Toepfit)[[1]])$values

# which means that the returned likelihood may be inaccurate, 
# and also that the actual matrix elements differ from input parameters:

Corr(Toepfit)[[1]][1,-1]  

### The usual rules for specifying covStruct, 'lower', 'upper' and 'init' apply
# here when the corrFamily term is the second random-effect:

(Toep2 <- spaMM::fitme(
        yield ~ 1 + (1|gen) + corrFamily(1|year), data=onofri.winterwheat, method="REML",
        covStruct=list("1"=NULL, corrFamily=list(Cf=Toepfn, tpar=rep(1e-4,6))), 
        , init=list(corrPars=list("2"=rep(0.1,6))),
        lower=list(corrPars=list("2"=rep(-0.999,6))), 
        upper=list(corrPars=list("2"=rep(0.999,6)))))

##### Fitting one variance among years per each of 8 genotypes. #####

# First, note that this can be *more efficiently* fitted by another syntax:

### Fit as a constrained random-coefficient model: 
    
# Diagonal matrix of NA's, represented as vector for its lower triangle:
ranCoefs_for_diag <- function(nlevels) { 
  vec <- rep(0,nlevels*(nlevels+1L)/2L)
  vec[cumsum(c(1L,rev(seq(nlevels-1L)+1L)))] <- NA
  vec
} 
    
(by_rC <- spaMM::fitme(yield ~ 1 + (0+gen|year), data=onofri.winterwheat, method="REML",
                       fixed=list(ranCoefs=list("1"=ranCoefs_for_diag(8)))))
                         
### Fit as a corrFamily model:   

gy_levels <- paste0(gl(8,1,length =56,labels=levels(onofri.winterwheat$gen)),":",
                        gl(7,8,labels=unique(onofri.winterwheat$year)))
                        
# A log scale is often suitable for variances, hence is used below;

# a correct but crude implementation of the model is
diagf <- function(logvar) {
  corr_map <- kronecker(Matrix::.symDiagonal(n=7),diag(x=exp(logvar)))
  rownames(corr_map) <- gy_levels
  corr_map
}

# but we can minimize matrix operations as follows:

corr_map <- Matrix::.symDiagonal(n=8,x=seq(8))
rownames(corr_map) <- unique(onofri.winterwheat$gen)
      
diagf <- function(logvar) {
  corr_map@x <- exp(logvar)[corr_map@x]
  corr_map
}                 # (and this returns a dsCMatrix)
      
(by_cF <- spaMM::fitme(
        yield ~ 1 + corrFamily(1|gen %in% year), data=onofri.winterwheat, method="REML",
        covStruct=list(corrFamily = list(Cf=diagf, tpar=rep(1,8))), 
        fixed=list(lambda=1),            # Don't forget to fix this redundant parameter!
        # init=list(corrPars=list("1"=rep(log(O.1),8))), # 'init' optional 
        lower=list(corrPars=list("1"=rep(log(1e-6),8))), # 'lower' and 'upper' required
        upper=list(corrPars=list("1"=rep(log(1e6),8)))))

# => The 'gen' effect is nested in the 'year' effect and this must be specified in the 
# right-hand side of corrFamily(1|gen %in% year) so that the design matrix 'Z' for the 
# random effects to have the correct structure. And then, as for other correlation
# structures (say Matern) it should be necessary to specify only the correlation matrix 
# for a given year, as done above. Should this fail, it is also possible to specify the 
# correlation matrix over years, as done below. spaMM will automatically detect, from   
# its size matching the number of columns of Z, that it must be the matrix over years.  

corr_map <- Matrix::forceSymmetric(kronecker(Matrix::.symDiagonal(n=7),diag(x=seq(8))))
rownames(corr_map) <- gy_levels

diagf <- function(logvar) {
  corr_map@x <- exp(logvar)[corr_map@x]
  corr_map
}                 # (and this returns a dsCMatrix)

(by_cF <- spaMM::fitme(
  yield ~ 1 + corrFamily(1|gen %in% year), data=onofri.winterwheat, method="REML",
  covStruct=list(corrFamily = list(Cf=diagf, tpar=rep(1,8))), 
  fixed=list(lambda=1),            # Don't forget to fix this redundant parameter!
  # init=list(corrPars=list("1"=rep(log(O.1),8))), # 'init' optional 
  lower=list(corrPars=list("1"=rep(log(1e-6),8))), # 'lower' and 'upper' required
  upper=list(corrPars=list("1"=rep(log(1e6),8)))))

exp(get_ranPars(by_cF)$corrPars[[1]]) # fitted variances 
  
}

[Package spaMM version 4.4.16 Index]