R: Multivariate variance matrix penalization

Multivariate variance matrix {tensorEVD}

R Documentation

Multivariate variance matrix penalization

Description

Ridge penalization of a multi-variate (co)variance matrix taking the form of either a Kronecker or Hadamard product

Usage

Kronecker_cov(Sigma = 1, K, Theta, swap = FALSE,
              rows = NULL, cols = NULL, 
              drop = TRUE, inplace = FALSE)
              
Hadamard_cov(Sigma = 1, K, Theta, IDS, IDK, 
             drop = TRUE, inplace = FALSE)

Arguments

`Sigma`	(numeric) A variance matrix among features. If is scalar, a scaled identity matrix with the same dimension as `Theta` is used
`K`	(numeric) Variance matrix among subjects
`Theta`	(numeric) A diagonal-shifting parameter, value to be added to the diagonals of the resulting (co)variance matrix. It should be a (symmetric) matrix with the same dimension as `Sigma`
`rows`	(integer) Index which rows of the (Kronecker product) (co)variance matrix are to be returned. Default `rows=NULL` will return all the rows
`cols`	(integer) Index which columns of the (Kronecker product) (co)variance are to be returned. Default `cols=NULL` return all the columns
`IDS`	(integer/character) Vector with either indices or row names mapping from rows/columns of `Sigma` and `Theta` into the resulting (Hadamard product) (co)variance matrix
`IDK`	(integer/character) Vector with either indices or row names mapping from rows/columns of `K` into the resulting (Hadamard product) (co)variance matrix
`swap`	(logical) Either `TRUE` or `FALSE` (default) to whether swap the order of the matrices in the resulting (Kronecker product) (co)variance matrix
`drop`	(logical) Either `TRUE` or `FALSE` to whether return a uni-dimensional vector when output is a matrix with either 1 row or 1 column as per the `rows` and `cols` arguments
`inplace`	(logical) Either `TRUE` or `FALSE` to whether operate directly on matrix `K` when `Sigma` and `Theta` are scalars. This is possible only when `rows=NULL` and `cols=NULL`. When `TRUE` the output will be overwritten on the same address occupied by `K`. Default `inplace=FALSE`

Details

Assume that a multi-variate random matrix X with n subjects in rows and p features in columns follows a matrix Gaussian distribution with certain matrix of means M and variance matrix K of dimension n × n between subjects, and Σ of dimension p × p between features.

Kronecker product form.

The random variable x = vec(X), formed by stacking columns of X, is a vector of length np that also follow a Gaussian distribution with mean vec(M) and (co)variance covariance matrix taking the Kronecker form

Σ⊗K

In the uni-variate case, the problem of near-singularity can be alleviated by penalizing the variance matrix K by adding positive elements θ to its diagonal, i.e., K + θI, where I is an identity matrix. The same can be applied to the multi-variate case where the Kronecker product (co)variance matrix is penalized with Θ={θ_ij} of dimensions p × p, where diagonal entries will penalize within feature i and off-diagonals will penalize between features i and j. This is,

Σ⊗K + Θ⊗I

The second Kronecker summand Θ⊗I is a sparse matrix consisting of non-zero diagonal and sub-diagonals. The Kronecker_cov function derives the penalized Kronecker (co)variance matrix by computing densely only the first Kronecker summand Σ⊗K, and then calculating and adding accordingly only the non-zero entries of Θ⊗I.

Note: Swapping the order of the matrices in the above Kronecker operations will yield a different result. In this case the penalized matrix

K⊗Σ + I⊗Θ

corresponds to the penalized multi-variate (co)variance matrix of the transposed of the above multi-variate random matrix X, now with features in rows and subjects in columns. This can be achieved by setting swap=TRUE in the Kronecker_cov function.

Hadamard product form.

Assume the random variable x₀ is a subset of x containing entries corresponding to specific combinations of subjects and features, then the (co)variance matrix of the vector x₀ will be a Hadamard product formed by the entry-wise product of only the elements of Σ and K involved in the combinations contained in x₀; this is

(Z₁ Σ Z'₁) ⊙ (Z₂ K Z'₂)

where Z₁ and Z₂ are incidence matrices mapping from entries of the random variable x₀ to rows (and columns) of Σ and K, respectively. This (co)variance matrix can be obtained using matrix indexing (see help(Hadamard)), as

Sigma[IDS,IDS]*K[IDK,IDK]

where IDS and IDK are integer vectors whose entries are the row (and column) number of Σ and K, respectively, that are mapped at each row of Z₁ and Z₂, respectively.

The penalized version of this Hadamard product (co)variance matrix will be

(Z₁ Σ Z'₁) ⊙ (Z₂ K Z'₂) + (Z₁ Θ Z'₁) ⊙ (Z₂ I Z'₂)

The Hadamard_cov function derives this penalized (co)variance matrix using matrix indexing, as

Sigma[IDS,IDS]*K[IDK,IDK] + Theta[IDS,IDS]*I[IDK,IDK]

Likewise, this function computes densely only the first Hadamard summand and then calculates and adds accordingly only the non-zero entries of the second summand.

Value

Returns the penalized (co)variance matrix formed either as a Kronecker or Hadamard product. For the Kronecker product case, it can be a sub-matrix of the Kronecker product as per the rows and cols arguments.

Examples

  require(tensorEVD)
  
  # Generate rectangular some covariance matrices 
  n = 30;  p = 10
  
  K = crossprod(matrix(rnorm(n*p), ncol=n))      # n x n matrix 
  Sigma = crossprod(matrix(rnorm(n*p), ncol=p))  # p x p matrix  
  Theta = crossprod(matrix(rnorm(n*p), ncol=p))  # p x p matrix
  
  # ==============================================
  # Kronecker covariance
  # ==============================================
  G1 = Kronecker_cov(Sigma, K, Theta = Theta)

  # it must equal to:
  D = diag(n)    # diagonal matrix of dimension n
  G2 = Kronecker(Sigma, K) + Kronecker(Theta, D)
  all.equal(G1,G2)
  
  # (b) Swapping the order of the matrices
  G1 = Kronecker_cov(Sigma, K, Theta, swap = TRUE)

  # in this case the kronecker is swapped:
  G2 = Kronecker(K, Sigma) + Kronecker(D, Theta)
  all.equal(G1,G2)
  
  # (c) Selecting specific entries of the output
  # We want only some rows and columns
  rows = c(1,3,5)
  cols = c(10,30,50)
  G1 = Kronecker_cov(Sigma, K, Theta, rows=rows, cols=cols)

  # this can be preferable instead of:
  G2 = (Kronecker(Sigma, K) + Kronecker(Theta, D))[rows,cols]
  all.equal(G1,G2)
  
  # (d) Inplace calculation
  # overwrite the output at the same address as the input:
  G1 = K[]                     # copy of K to be used as input
  add  = pryr::address(G1)     # address of G on entry
  G1 = Kronecker_cov(Sigma=0.5, G1, Theta=1.5)
  pryr::address(G1) == add     # on exit, G was moved to a different address
  
  G2 = K[]   
  add  = pryr::address(G2) 
  G2 = Kronecker_cov(Sigma=0.5, G2, Theta=1.5, inplace=TRUE)
  pryr::address(G2) == add     # on exit, G remains at the same address
  all.equal(G1,G2)
  
  # ==============================================
  # Hadamard covariance
  # ==============================================
  # Define IDs for a Hadamard of size m x m
  m = 1000
  IDS = sample(1:p, m, replace=TRUE)
  IDK = sample(1:n, m, replace=TRUE)
  
  G1 = Hadamard_cov(Sigma, K, Theta, IDS=IDS, IDK=IDK)
  
  # it must equal to:
  G2 = Sigma[IDS,IDS]*K[IDK,IDK] + Theta[IDS,IDS]*D[IDK,IDK]
  all.equal(G1,G2)
  
  # (b) Inplace calculation
  # overwrite the output at the same address as the input:
  G1 = K[]                     # copy of K to be used as input
  add  = pryr::address(G1)     # address of G on entry
  G1 = Hadamard_cov(Sigma=0.5, G1, Theta=1.5, IDS=rep(1,n))
  pryr::address(G1) == add     # on exit, G was moved to a different address
  
  G2 = K[]   
  add  = pryr::address(G2)
  G2 = Hadamard_cov(Sigma=0.5, G2, Theta=1.5, IDS=rep(1,n), inplace=TRUE)
  pryr::address(G2) == add     # on exit, G remains at the same address
  all.equal(G1,G2)

[Package tensorEVD version 0.1.3 Index]