R: Ridge penalized estimation of the precision matrix from data...

ridgePrep {porridge}

R Documentation

Ridge penalized estimation of the precision matrix from data with replicates.

Description

Estimation of the precision matrix from data with replicates through a ridge penalized EM (Expectation-Maximization) algorithm. It assumes a simple 'signal+noise' model, both random variables are assumed to be drawn from a multivariate normal distribution with their own unstructured precision matrix. These precision matrices are estimated.

Usage

ridgePrep(Y, ids, lambdaZ, lambdaE, 		
          targetZ=matrix(0, ncol(Y), ncol(Y)),
          targetE=matrix(0, ncol(Y), ncol(Y)),
          nInit=100, minSuccDiff=10^(-10))

Arguments

`Y`	Data `matrix` with samples (including the repetitions) as rows and variates as columns.
`ids`	A `numeric` indicating which rows of `Y` belong to the same individal.
`lambdaZ`	A positive `numeric` representing the ridge penalty parameter for the signal precision matrix estimate.
`lambdaE`	A positive `numeric` representing the ridge penalty parameter for the error precision matrix estimate.
`targetZ`	A semi-positive definite target `matrix` towards which the signal precision matrix estimate is shrunken.
`targetE`	A semi-positive definite target `matrix` towards which the error precision matrix estimate is shrunken.
`nInit`	A `numeric` specifying the number of iterations.
`minSuccDiff`	A `numeric`: minimum successive difference (in terms of the relative change in the absolute difference of the penalized loglikelihood) between two succesive estimates to be achieved.

Details

Data are assumed to originate from a design with replicates. Each observation \mathbf{Y}_{i,k_i} with k_i (k_i = 1, \ldots, K_i) the k_i-th replicate of the i-th sample, is described by a ‘signal+noise’ model: \mathbf{Y}_{i,k_i} = \mathbf{Z}_i + \boldsymbol{\varepsilon}_{i,k_i}, where \mathbf{Z}_i and \boldsymbol{\varepsilon}_{i,k_i} represent the signal and noise, respectively. Each observation \mathbf{Y}_{i,k_i} follows a multivariate normal law of the form \mathbf{Y}_{i,k_i} \sim \mathcal{N}(\mathbf{0}_p, \boldsymbol{\Omega}_z^{-1} + \boldsymbol{\Omega}_{\varepsilon}^{-1}), which results from the distributional assumptions of the signal and the noise, \mathbf{Z}_{i} \sim \mathcal{N}(\mathbf{0}_p, \boldsymbol{\Omega}_z^{-1}) and \boldsymbol{\varepsilon}_{i, k_i} \sim \mathcal{N}(\mathbf{0}_p, \boldsymbol{\Omega}_{\varepsilon}^{-1}), and their independence. The model parameters are estimated by means of a penalized EM algorithm that maximizes the loglikelihood augmented with the penalty \lambda_z \| \boldsymbol{\Omega}_z - \mathbf{T}_z \|_F^2 + \lambda_{\varepsilon} \| \boldsymbol{\Omega}_{\varepsilon} - \mathbf{T}_{\varepsilon} \|_F^2, in which \mathbf{T}_z and \mathbf{T}_{\varepsilon} are the shrinkage targets of the signal and noise precision matrices, respectively. For more details see van Wieringen and Chen (2019).

Value

The function returns the regularized inverse covariance list-object with slots:

`Pz`	The estimated signal precision matrix.
`Pz`	The estimated error precision matrix.
`penLL`	The penalized loglikelihood of the estimated model.

Author(s)

W.N. van Wieringen.

References

van Wieringen, W.N., Chen, Y. (2021), "Penalized estimation of the Gaussian graphical model from data with replicates", Statistics in Medicine, 40(19), 4279-4293.

Examples

# set parameters
p        <- 10
Se       <- diag(runif(p))
Sz       <- matrix(3, p, p)
diag(Sz) <- 4

# draw data
n <- 100
ids <- numeric()
Y   <- numeric()
for (i in 1:n){
     Ki <- sample(2:5, 1)
     Zi <- mvtnorm::rmvnorm(1, sigma=Sz)
     for (k in 1:Ki){
          Y   <- rbind(Y, Zi + mvtnorm::rmvnorm(1, sigma=Se))
          ids <- c(ids, i)
     }
}

# estimate
Ps <- ridgePrep(Y, ids, 1, 1)

[Package porridge version 0.3.3 Index]