R: Fit a series of missoNet models with user-supplied...

missoNet {missoNet}

R Documentation

Fit a series of missoNet models with user-supplied regularization parameters for the lasso penalties

Description

This function fits the conditional graphical lasso models to datasets with missing response values. ‘missoNet’ computes the regularization path for the lasso penalties sequentially along the bivariate regularization parameter sequence \{(\lambda_B, \lambda_\Theta)\} provided by the user.

Usage

missoNet(
  X,
  Y,
  lambda.Beta,
  lambda.Theta,
  rho = NULL,
  Beta.maxit = 10000,
  Beta.thr = 1e-08,
  eta = 0.8,
  Theta.maxit = 10000,
  Theta.thr = 1e-08,
  eps = 1e-08,
  penalize.diagonal = TRUE,
  diag.penalty.factor = NULL,
  standardize = TRUE,
  standardize.response = TRUE,
  fit.relax = FALSE,
  parallel = FALSE,
  cl = NULL,
  verbose = 1
)

Arguments

`X`	Numeric predictor matrix (`n\times p`): columns correspond to predictor variables and rows correspond to samples. Missing values are not allowed. There is no need for centering or scaling of the variables. `'X'` should not include a column of ones for an intercept.
`Y`	Numeric response matrix (`n\times q`): columns correspond to response variables and rows correspond to samples. Missing values should be coded as either `'NA'`s or `'NaN'`s. There is no need for centering or scaling of the variables.
`lambda.Beta`	A scalar or a numeric vector: a user-supplied sequence of non-negative value(s) for {`\lambda_B`} used to penalize the elements of the coefficient matrix `\mathbf{B}`. Note that the values will be sequentially visited in the given orders as inputs to the regularization parameter sequence `\{(\lambda_B, \lambda_\Theta)\}`; `'lambda.Beta'` must have the same length as `'lambda.Theta'`.
`lambda.Theta`	A scalar or a numeric vector: a user-supplied sequence of non-negative value(s) for {`\lambda_\Theta`} used to penalize the (off-diagonal) elements of the precision matrix `\mathbf{\Theta}`. Note that the values will be sequentially visited in the given orders as inputs to the regularization parameter sequence `\{(\lambda_B, \lambda_\Theta)\}`; `'lambda.Theta'` must have the same length as `'lambda.Beta'`.
`rho`	(Optional) A scalar or a numeric vector of length `q`: the elements are user-supplied probabilities of missingness for the response variables. The default is `'rho = NULL'` and the program will compute the empirical missing rates for each of the columns of `'Y'` and use them as the working missing probabilities. The default setting should suffice in most cases; misspecified missing probabilities would introduce biases into the model.
`Beta.maxit`	The maximum number of iterations of the fast iterative shrinkage-thresholding algorithm (FISTA) for updating `\hat{\mathbf{B}}`. The default is `'Beta.maxit = 10000'`.
`Beta.thr`	The convergence threshold of the FISTA algorithm for updating `\hat{\mathbf{B}}`; the default is `'Beta.thr = 1.0E-8'`. Iterations stop when the absolute parameter change is less than (`'Beta.thr'` `*` `sum(abs(\hat{\mathbf{B}}))`).
`eta`	The backtracking line search shrinkage factor; the default is `'eta = 0.8'`. Most users will be able to use the default value, some experienced users may want to tune `'eta'` according to the properties of a specific dataset for a faster convergence of the FISTA algorithm. Note that `'eta'` must be in (0, 1).
`Theta.maxit`	The maximum number of iterations of the ‘`glasso`’ algorithm for updating `\hat{\mathbf{\Theta}}`. The default is `'Theta.maxit = 10000'`.
`Theta.thr`	The convergence threshold of the ‘`glasso`’ algorithm for updating `\hat{\mathbf{\Theta}}`; the default is `'Theta.thr = 1.0E-8'`. Iterations stop when the average absolute parameter change is less than (`'Theta.thr'` `*` `ave(abs(offdiag(\hat{\mathbf{\Sigma}})))`), where `\hat{\mathbf{\Sigma}}` denotes the empirical working covariance matrix.
`eps`	A numeric tolerance level for the L1 projection of the empirical covariance matrix; the default is `'eps = 1.0E-8'`. The empirical covariance matrix will be projected onto a L1 ball to have `min(eigen(\hat{\mathbf{\Sigma}})$value)` == `'eps'`, if any of the eigenvalues is less than the specified tolerance. Most users will be able to use the default value.
`penalize.diagonal`	Logical: should the diagonal elements of `\mathbf{\Theta}` be penalized? The default is `'TRUE'`.
`diag.penalty.factor`	Numeric: a separate penalty multiplication factor for the diagonal elements of `\mathbf{\Theta}` when `'penalize.diagonal = TRUE'`. `\lambda_\Theta` is multiplied by this number to allow a differential shrinkage of the diagonal elements. The default is `'NULL'` and the program will guess a value based on an initial estimate of `\mathbf{\Theta}`. This factor could be `'0'` for no shrinkage (equivalent to `'penalize.diagonal = FALSE'`) or `'1'` for an equal shrinkage.
`standardize`	Logical: should the columns of `'X'` be standardized so each has unit variance? The default is `'TRUE'`. The estimated results will always be returned on the original scale. If `'X'` has been standardized prior to fitting the model, you might not wish to standardize it inside the algorithm.
`standardize.response`	Logical: should the columns of `'Y'` be standardized so each has unit variance? The default is `'TRUE'`. The estimated results will always be returned on the original scale. If `'Y'` has been standardized prior to fitting the model, you might not wish to standardize it inside the algorithm.
`fit.relax`	Logical: the default is `'FALSE'`. If `'TRUE'`, the program will re-estimate the edges in the active set (i.e. nonzero off-diagonal elements) of the network structure `\hat{\mathbf{\Theta}}` without penalization (`\lambda_\Theta=0`). This debiased estimate of `\mathbf{\Theta}` could be useful for some interdependency analyses. WARNING: there may be convergence issues if the empirical covariance matrix is not of full rank (e.g. `n < q)`).
`parallel`	Logical: the default is `'FALSE'`. If `'TRUE'`, the program uses clusters to fit models with each element of the `\lambda` sequence `\{(\lambda_B, \lambda_\Theta)\}` in parallel. Must register parallel clusters beforehand, see examples below.
`cl`	A cluster object created by ‘`parallel::makeCluster`’ for parallel evaluations. This is only needed when `'parallel = TRUE'`.
`verbose`	Value of `'0'`, `'1'` or `'2'`. `'verbose = 0'` – silent; `'verbose = 1'` (the default) – limited tracing with progress bars; `'verbose = 2'` – detailed tracing. Note that displaying the progress bars slightly increases the computation overhead compared to the silent mode. The detailed tracing should be used for monitoring progress only when the program runs extremely slowly, and it is not supported under `'parallel = TRUE'`.

Details

‘missoNet’ is the main model-fitting function which is specifically proposed to fit the conditional graphical lasso models / penalized multi-task Gaussian regressions to (corrupted) datasets with response values missing at random (MAR). To facilitate the interpretation of the model, let's temporarily assume that there are no missing values in the data used to fit the model. Suppose we have n observations of both a p-variate predictor X \in \mathcal{R}^p and a q-variate response Y \in \mathcal{R}^q, for the ith sample (i = 1,...,n), ‘missoNet’ assumes the model

Y_i = \mu + X_i\mathbf{B} + E_i,\ \ E_i \sim \mathcal{MVN}(0_q, (\mathbf{\Theta})^{-1}),

where Y_i \in \mathcal{R}^{1\times q} and X_i \in \mathcal{R}^{1\times p} are one realization of the q responses and the p predictors, respectively. E_i \in \mathcal{R}^{1\times q} is an error vector drawn from a multivariate Gaussian distribution.

The regression coefficient matrix \mathbf{B} \in \mathcal{R}^{p\times q} that mapping predictors to responses and the precision (inverse covariance) matrix \mathbf{\Theta} \in \mathcal{R}^{q\times q} that revealing the responses' conditional dependencies are the parameters to be estimated by solving a penalized MLE problem

(\hat{\mathbf{\Theta}},\hat{\mathbf{B}}) = {\mathrm{argmin}}_{\mathbf{\Theta} \succeq 0,\ \mathbf{B}}\ g(\mathbf{\Theta},\mathbf{B}) + \lambda_{\Theta}(\|\mathbf{\Theta}\|_{1,\mathrm{off}} + 1_{n\leq \mathrm{max}(p,q)} \|\mathbf{\Theta}\|_{1,\mathrm{diag}}) + \lambda_{B}\|\mathbf{B}\|_1,

where

g(\mathbf{\Theta},\mathbf{B}) = \mathrm{tr}\left[\frac{1}{n}(\mathbf{Y} - \mathbf{XB})^\top(\mathbf{Y} - \mathbf{XB}) \mathbf{\Theta}\right] - \mathrm{log}|\mathbf{\Theta}|.

The response matrix \mathbf{Y} \in \mathcal{R}^{n\times q} has ith row (Y_i - \frac{1}{n}\sum_{j=1}^n Y_j), and the predictor matrix \mathbf{X} \in \mathcal{R}^{n\times p} has ith row (X_i - \frac{1}{n}\sum_{j=1}^n X_j). The intercept \mu \in \mathcal{R}^{1\times q} is canceled out because of centering of the data matrices \mathbf{Y} and \mathbf{X}. 1_{n\leq \mathrm{max}(p,q)} denotes the indicator function for whether penalizing the diagonal elements of \mathbf{\Theta} or not. When n\leq \mathrm{max}(p,q), a global minimizer of the objective function defined above does not exist without the diagonal penalization.

Missingness in real data is inevitable. In this instance, the estimates based only on complete cases are likely to be biased, and the objective function is likely to no longer be a biconvex optimization problem. In addition, many algorithms cannot be directly employed since they require complete datasets as inputs. ‘missoNet’ aims to handle the specific situation where the response matrix \mathbf{Y} contains values that are missing at random (MAR. Please refer to the vignette or other resources for more information about the differences between MAR, missing completely at random (MCAR) and missing not at random (MNAR)). As it should be, ‘missoNet’ is also applicable to datasets with MCAR response values or without any missing values. The method provides a unified framework for automatically solving a convex modification of the multi-task learning problem defined above, using corrupted datasets. Moreover, ‘missoNet’ enjoys the theoretical and computational benefits of convexity and returns solutions that are comparable/close to the clean conditional graphical lasso estimates. Please refer to the original manuscript (coming soon) for more details of our method.

Value

This function returns a 'list' consisting of the following components:

`est.list`	A named `'list'` storing the lists of results estimated at each of the `\lambda` pairs, (`\lambda_B`, `\lambda_\Theta`). Each sub-`'list'` contains: `Beta`: the penalized estimate of the regression coefficient matrix `\hat{\mathbf{B}}` (`p\times q`). `Theta`: the penalized estimate of the precision matrix `\hat{\mathbf{\Theta}}` (`q\times q`). `mu`: a vector of length `q` storing the model intercept `\hat{\mu}`. `lambda.Beta`: the value of `\lambda_B` used to fit the model. `lambda.Theta`: the value of `\lambda_\Theta` used to fit the model. `relax.net`: the relaxed (debiased) estimate of the conditional network structure `\hat{\mathbf{\Theta}}_\mathrm{rlx}` (`q\times q`) if `'fit.relax = TRUE'` when calling ‘`missoNet`’.
`rho`	A vector of length `q` storing the working missing probabilities for the `q` response variables.
`penalize.diagonal`	Logical: whether the diagonal elements of `\mathbf{\Theta}` were penalized.
`diag.penalty.factor`	The additional penalty multiplication factor for the diagonal elements of `\mathbf{\Theta}` when `'penalize.diagonal'` was returned as `'TRUE'`.

Author(s)

Yixiao Zeng yixiao.zeng@mail.mcgill.ca, Celia M.T. Greenwood and Archer Yi Yang.

Examples

## Simulate a dataset with response values missing completely at random (MCAR), 
## the overall missing rate is around 10%.
set.seed(123)  # reproducibility
sim.dat <- generateData(n = 300, p = 50, q = 20, rho = 0.1, missing.type = "MCAR")
tr <- 1:240  # training set indices
tst <- 241:300  # test set indices
X.tr <- sim.dat$X[tr, ]  # predictor matrix
Y.tr <- sim.dat$Z[tr, ]  # corrupted response matrix


## Fit one missoNet model with two scalars for 'lambda.Beta' and 'lambda.Theta'.
fit1 <- missoNet(X = X.tr, Y = Y.tr, lambda.Beta = 0.1, lambda.Theta = 0.2)


## Fit a series of missoNet models with the lambda pairs := (lambda.Beta, lambda.Theta)
## sequentially extracted from the 'lambda.Beta' and 'lambda.Theta' vectors, note that the 
## two vectors must have the same length.
lamB.vec <- 10^(seq(from = 0, to = -1, length.out = 5))
lamTht.vec <- rep(0.1, 5)
fit2 <- missoNet(X = X.tr, Y = Y.tr, lambda.Beta = lamB.vec, lambda.Theta = lamTht.vec)


## Parallelization on a cluster with two cores.
cl <- parallel::makeCluster(2)
fit2 <- missoNet(X = X.tr, Y = Y.tr, lambda.Beta = lamB.vec, lambda.Theta = lamTht.vec, 
                 parallel = TRUE, cl = cl)
parallel::stopCluster(cl)


## Extract the estimates at ('lamB.vec[1]', 'lamTht.vec[1]').
## The estimates at the subsequent lambda pairs could be accessed in the same way.
Beta.hat <- fit2$est.list[[1]]$Beta
Theta.hat <- fit2$est.list[[1]]$Theta
lambda.Beta <- fit2$est.list[[1]]$lambda.Beta  # equal to 'lamB.vec[1]'
lambda.Theta <- fit2$est.list[[1]]$lambda.Theta  # equal to 'lamTht.vec[1]'


## Fit a series of missoNet models using PRE-STANDARDIZED training data
## if you wish to compare the results with other softwares. 
X.tr.std <- scale(X.tr, center = TRUE, scale = TRUE)
Y.tr.std <- scale(Y.tr, center = TRUE, scale = TRUE)
fit3 <- missoNet(X = X.tr.std, Y = Y.tr.std, lambda.Beta = lamB.vec, lambda.Theta = lamTht.vec,
                 standardize = FALSE, standardize.response = FALSE)

[Package missoNet version 1.2.0 Index]