Multivariate variance matrix {tensorEVD} | R Documentation |
Multivariate variance matrix penalization
Description
Ridge penalization of a multi-variate (co)variance matrix taking the form of either a Kronecker or Hadamard product
Usage
Kronecker_cov(Sigma = 1, K, Theta, swap = FALSE,
rows = NULL, cols = NULL,
drop = TRUE, inplace = FALSE)
Hadamard_cov(Sigma = 1, K, Theta, IDS, IDK,
drop = TRUE, inplace = FALSE)
Arguments
Sigma |
(numeric) A variance matrix among features. If is scalar, a scaled identity matrix with the same dimension as |
K |
(numeric) Variance matrix among subjects |
Theta |
(numeric) A diagonal-shifting parameter, value to be added to the diagonals of the resulting (co)variance matrix. It should be a (symmetric) matrix with the same dimension as |
rows |
(integer) Index which rows of the (Kronecker product) (co)variance matrix are to be returned. Default |
cols |
(integer) Index which columns of the (Kronecker product) (co)variance are to be returned. Default |
IDS |
(integer/character) Vector with either indices or row names mapping from rows/columns of |
IDK |
(integer/character) Vector with either indices or row names mapping from rows/columns of |
swap |
(logical) Either |
drop |
(logical) Either |
inplace |
(logical) Either |
Details
Assume that a multi-variate random matrix X with n subjects in rows and p features in columns follows a matrix Gaussian distribution with certain matrix of means M and variance matrix K of dimension n × n between subjects, and Σ of dimension p × p between features.
Kronecker product form.The random variable x = vec(X), formed by stacking columns of X, is a vector of length np that also follow a Gaussian distribution with mean vec(M) and (co)variance covariance matrix taking the Kronecker form
Σ⊗K
In the uni-variate case, the problem of near-singularity can be alleviated by penalizing the variance matrix K by adding positive elements θ to its diagonal, i.e., K + θI, where I is an identity matrix. The same can be applied to the multi-variate case where the Kronecker product (co)variance matrix is penalized with Θ={θij} of dimensions p × p, where diagonal entries will penalize within feature i and off-diagonals will penalize between features i and j. This is,
Σ⊗K + Θ⊗I
The second Kronecker summand
Θ⊗I
is a sparse matrix consisting of non-zero diagonal and sub-diagonals. The Kronecker_cov
function derives the penalized Kronecker (co)variance matrix by computing densely only the first Kronecker summand
Σ⊗K,
and then calculating and adding accordingly only the non-zero entries of
Θ⊗I.
Note: Swapping the order of the matrices in the above Kronecker operations will yield a different result. In this case the penalized matrix
K⊗Σ + I⊗Θ
corresponds to the penalized multi-variate (co)variance matrix of the transposed of the above multi-variate random matrix
X, now with features in rows and subjects in columns. This can be achieved by setting swap=TRUE
in the Kronecker_cov
function.
Assume the random variable x0 is a subset of x containing entries corresponding to specific combinations of subjects and features, then the (co)variance matrix of the vector x0 will be a Hadamard product formed by the entry-wise product of only the elements of Σ and K involved in the combinations contained in x0; this is
(Z1 Σ Z'1) ⊙ (Z2 K Z'2)
where
Z1 and
Z2 are incidence matrices mapping from entries of the random variable x0 to rows (and columns) of Σ and K, respectively. This (co)variance matrix can be obtained using matrix indexing (see help(Hadamard)
), as
Sigma[IDS,IDS]*K[IDK,IDK]
where IDS
and IDK
are integer vectors whose entries are the row (and column) number of
Σ and K, respectively, that are mapped at each row of
Z1 and
Z2, respectively.
The penalized version of this Hadamard product (co)variance matrix will be
(Z1 Σ Z'1) ⊙ (Z2 K Z'2) + (Z1 Θ Z'1) ⊙ (Z2 I Z'2)
The Hadamard_cov
function derives this penalized (co)variance matrix using matrix indexing, as
Sigma[IDS,IDS]*K[IDK,IDK] + Theta[IDS,IDS]*I[IDK,IDK]
Likewise, this function computes densely only the first Hadamard summand and then calculates and adds accordingly only the non-zero entries of the second summand.
Value
Returns the penalized (co)variance matrix formed either as a Kronecker or Hadamard product. For the Kronecker product case, it can be a sub-matrix of the Kronecker product as per the rows
and cols
arguments.
Examples
require(tensorEVD)
# Generate rectangular some covariance matrices
n = 30; p = 10
K = crossprod(matrix(rnorm(n*p), ncol=n)) # n x n matrix
Sigma = crossprod(matrix(rnorm(n*p), ncol=p)) # p x p matrix
Theta = crossprod(matrix(rnorm(n*p), ncol=p)) # p x p matrix
# ==============================================
# Kronecker covariance
# ==============================================
G1 = Kronecker_cov(Sigma, K, Theta = Theta)
# it must equal to:
D = diag(n) # diagonal matrix of dimension n
G2 = Kronecker(Sigma, K) + Kronecker(Theta, D)
all.equal(G1,G2)
# (b) Swapping the order of the matrices
G1 = Kronecker_cov(Sigma, K, Theta, swap = TRUE)
# in this case the kronecker is swapped:
G2 = Kronecker(K, Sigma) + Kronecker(D, Theta)
all.equal(G1,G2)
# (c) Selecting specific entries of the output
# We want only some rows and columns
rows = c(1,3,5)
cols = c(10,30,50)
G1 = Kronecker_cov(Sigma, K, Theta, rows=rows, cols=cols)
# this can be preferable instead of:
G2 = (Kronecker(Sigma, K) + Kronecker(Theta, D))[rows,cols]
all.equal(G1,G2)
# (d) Inplace calculation
# overwrite the output at the same address as the input:
G1 = K[] # copy of K to be used as input
add = pryr::address(G1) # address of G on entry
G1 = Kronecker_cov(Sigma=0.5, G1, Theta=1.5)
pryr::address(G1) == add # on exit, G was moved to a different address
G2 = K[]
add = pryr::address(G2)
G2 = Kronecker_cov(Sigma=0.5, G2, Theta=1.5, inplace=TRUE)
pryr::address(G2) == add # on exit, G remains at the same address
all.equal(G1,G2)
# ==============================================
# Hadamard covariance
# ==============================================
# Define IDs for a Hadamard of size m x m
m = 1000
IDS = sample(1:p, m, replace=TRUE)
IDK = sample(1:n, m, replace=TRUE)
G1 = Hadamard_cov(Sigma, K, Theta, IDS=IDS, IDK=IDK)
# it must equal to:
G2 = Sigma[IDS,IDS]*K[IDK,IDK] + Theta[IDS,IDS]*D[IDK,IDK]
all.equal(G1,G2)
# (b) Inplace calculation
# overwrite the output at the same address as the input:
G1 = K[] # copy of K to be used as input
add = pryr::address(G1) # address of G on entry
G1 = Hadamard_cov(Sigma=0.5, G1, Theta=1.5, IDS=rep(1,n))
pryr::address(G1) == add # on exit, G was moved to a different address
G2 = K[]
add = pryr::address(G2)
G2 = Hadamard_cov(Sigma=0.5, G2, Theta=1.5, IDS=rep(1,n), inplace=TRUE)
pryr::address(G2) == add # on exit, G remains at the same address
all.equal(G1,G2)