Multivariate variance matrix {tensorEVD}R Documentation

Multivariate variance matrix penalization

Description

Ridge penalization of a multi-variate (co)variance matrix taking the form of either a Kronecker or Hadamard product

Usage

Kronecker_cov(Sigma = 1, K, Theta, swap = FALSE,
              rows = NULL, cols = NULL, 
              drop = TRUE, inplace = FALSE)
              
Hadamard_cov(Sigma = 1, K, Theta, IDS, IDK, 
             drop = TRUE, inplace = FALSE) 
              

Arguments

Sigma

(numeric) A variance matrix among features. If is scalar, a scaled identity matrix with the same dimension as Theta is used

K

(numeric) Variance matrix among subjects

Theta

(numeric) A diagonal-shifting parameter, value to be added to the diagonals of the resulting (co)variance matrix. It should be a (symmetric) matrix with the same dimension as Sigma

rows

(integer) Index which rows of the (Kronecker product) (co)variance matrix are to be returned. Default rows=NULL will return all the rows

cols

(integer) Index which columns of the (Kronecker product) (co)variance are to be returned. Default cols=NULL return all the columns

IDS

(integer/character) Vector with either indices or row names mapping from rows/columns of Sigma and Theta into the resulting (Hadamard product) (co)variance matrix

IDK

(integer/character) Vector with either indices or row names mapping from rows/columns of K into the resulting (Hadamard product) (co)variance matrix

swap

(logical) Either TRUE or FALSE (default) to whether swap the order of the matrices in the resulting (Kronecker product) (co)variance matrix

drop

(logical) Either TRUE or FALSE to whether return a uni-dimensional vector when output is a matrix with either 1 row or 1 column as per the rows and cols arguments

inplace

(logical) Either TRUE or FALSE to whether operate directly on matrix K when Sigma and Theta are scalars. This is possible only when rows=NULL and cols=NULL. When TRUE the output will be overwritten on the same address occupied by K. Default inplace=FALSE

Details

Assume that a multi-variate random matrix X with n subjects in rows and p features in columns follows a matrix Gaussian distribution with certain matrix of means M and variance matrix K of dimension n × n between subjects, and Σ of dimension p × p between features.

Kronecker product form.

The random variable x = vec(X), formed by stacking columns of X, is a vector of length np that also follow a Gaussian distribution with mean vec(M) and (co)variance covariance matrix taking the Kronecker form

ΣK

In the uni-variate case, the problem of near-singularity can be alleviated by penalizing the variance matrix K by adding positive elements θ to its diagonal, i.e., K + θI, where I is an identity matrix. The same can be applied to the multi-variate case where the Kronecker product (co)variance matrix is penalized with Θ={θij} of dimensions p × p, where diagonal entries will penalize within feature i and off-diagonals will penalize between features i and j. This is,

ΣK + ΘI

The second Kronecker summand ΘI is a sparse matrix consisting of non-zero diagonal and sub-diagonals. The Kronecker_cov function derives the penalized Kronecker (co)variance matrix by computing densely only the first Kronecker summand ΣK, and then calculating and adding accordingly only the non-zero entries of ΘI.

Note: Swapping the order of the matrices in the above Kronecker operations will yield a different result. In this case the penalized matrix

KΣ + IΘ

corresponds to the penalized multi-variate (co)variance matrix of the transposed of the above multi-variate random matrix X, now with features in rows and subjects in columns. This can be achieved by setting swap=TRUE in the Kronecker_cov function.

Hadamard product form.

Assume the random variable x0 is a subset of x containing entries corresponding to specific combinations of subjects and features, then the (co)variance matrix of the vector x0 will be a Hadamard product formed by the entry-wise product of only the elements of Σ and K involved in the combinations contained in x0; this is

(Z1 Σ Z'1) ⊙ (Z2 K Z'2)

where Z1 and Z2 are incidence matrices mapping from entries of the random variable x0 to rows (and columns) of Σ and K, respectively. This (co)variance matrix can be obtained using matrix indexing (see help(Hadamard)), as

Sigma[IDS,IDS]*K[IDK,IDK]

where IDS and IDK are integer vectors whose entries are the row (and column) number of Σ and K, respectively, that are mapped at each row of Z1 and Z2, respectively.

The penalized version of this Hadamard product (co)variance matrix will be

(Z1 Σ Z'1) ⊙ (Z2 K Z'2) + (Z1 Θ Z'1) ⊙ (Z2 I Z'2)

The Hadamard_cov function derives this penalized (co)variance matrix using matrix indexing, as

Sigma[IDS,IDS]*K[IDK,IDK] + Theta[IDS,IDS]*I[IDK,IDK]

Likewise, this function computes densely only the first Hadamard summand and then calculates and adds accordingly only the non-zero entries of the second summand.

Value

Returns the penalized (co)variance matrix formed either as a Kronecker or Hadamard product. For the Kronecker product case, it can be a sub-matrix of the Kronecker product as per the rows and cols arguments.

Examples

  require(tensorEVD)
  
  # Generate rectangular some covariance matrices 
  n = 30;  p = 10
  
  K = crossprod(matrix(rnorm(n*p), ncol=n))      # n x n matrix 
  Sigma = crossprod(matrix(rnorm(n*p), ncol=p))  # p x p matrix  
  Theta = crossprod(matrix(rnorm(n*p), ncol=p))  # p x p matrix
  
  # ==============================================
  # Kronecker covariance
  # ==============================================
  G1 = Kronecker_cov(Sigma, K, Theta = Theta)

  # it must equal to:
  D = diag(n)    # diagonal matrix of dimension n
  G2 = Kronecker(Sigma, K) + Kronecker(Theta, D)
  all.equal(G1,G2)
  
  # (b) Swapping the order of the matrices
  G1 = Kronecker_cov(Sigma, K, Theta, swap = TRUE)

  # in this case the kronecker is swapped:
  G2 = Kronecker(K, Sigma) + Kronecker(D, Theta)
  all.equal(G1,G2)
  
  # (c) Selecting specific entries of the output
  # We want only some rows and columns
  rows = c(1,3,5)
  cols = c(10,30,50)
  G1 = Kronecker_cov(Sigma, K, Theta, rows=rows, cols=cols)

  # this can be preferable instead of:
  G2 = (Kronecker(Sigma, K) + Kronecker(Theta, D))[rows,cols]
  all.equal(G1,G2)
  
  # (d) Inplace calculation
  # overwrite the output at the same address as the input:
  G1 = K[]                     # copy of K to be used as input
  add  = pryr::address(G1)     # address of G on entry
  G1 = Kronecker_cov(Sigma=0.5, G1, Theta=1.5)
  pryr::address(G1) == add     # on exit, G was moved to a different address
  
  G2 = K[]   
  add  = pryr::address(G2) 
  G2 = Kronecker_cov(Sigma=0.5, G2, Theta=1.5, inplace=TRUE)
  pryr::address(G2) == add     # on exit, G remains at the same address
  all.equal(G1,G2)
  
  # ==============================================
  # Hadamard covariance
  # ==============================================
  # Define IDs for a Hadamard of size m x m
  m = 1000
  IDS = sample(1:p, m, replace=TRUE)
  IDK = sample(1:n, m, replace=TRUE)
  
  G1 = Hadamard_cov(Sigma, K, Theta, IDS=IDS, IDK=IDK)
  
  # it must equal to:
  G2 = Sigma[IDS,IDS]*K[IDK,IDK] + Theta[IDS,IDS]*D[IDK,IDK]
  all.equal(G1,G2)
  
  # (b) Inplace calculation
  # overwrite the output at the same address as the input:
  G1 = K[]                     # copy of K to be used as input
  add  = pryr::address(G1)     # address of G on entry
  G1 = Hadamard_cov(Sigma=0.5, G1, Theta=1.5, IDS=rep(1,n))
  pryr::address(G1) == add     # on exit, G was moved to a different address
  
  G2 = K[]   
  add  = pryr::address(G2)
  G2 = Hadamard_cov(Sigma=0.5, G2, Theta=1.5, IDS=rep(1,n), inplace=TRUE)
  pryr::address(G2) == add     # on exit, G remains at the same address
  all.equal(G1,G2)
  

[Package tensorEVD version 0.1.3 Index]