datacggm {cglasso}R Documentation

Create a Dataset from a Conditional Gaussian Graphical Model with Censored and/or Missing Values

Description

‘The datacggm’ function is used to create a dataset from a conditional Gaussian graphical model with censored and/or missing values.

Usage

datacggm(Y, lo = -Inf, up =  +Inf, X = NULL, control = list(maxit = 1.0E+4,
         thr = 1.0E-4))

Arguments

Y

a (n\times p)-dimensional matrix; each row is an observation from a conditional Gaussian graphical model with censoring vectors lo and up. Missing-at-random values are recorded as ‘NA’.

lo

the lower censoring vector; lo[j] is used to specify the lower censoring value for the random variable Y_j.

up

the upper censoring vector; up[j] is used to specify the upper censoring value for the random variable Y_j.

X

an optional (n\times q)-dimensional data frame of predictors. If missing (default), a dataset from a Gaussian graphical model is returned otherwise a dataset from a conditional Gaussian graphical model is returned.

control

a named list used to pass the arguments to the EM algorithm (see below for more details). The components are:

  • maxit: maximum number of iterations. Default is 1.0E+4.

  • thr: threshold for the convergence. Default value is 1.0E-4.

Details

The function ‘datacggm’ returns an R object of class ‘datacggm’, that is a named list containing the elements needed to fit a conditional graphical LASSO (cglasso) model to datasets with censored and/or missing values.

A set of specific method functions are developed to decsribe data with censored/missing values. For example, the method function ‘print.datacggm’ prints out the left and right-censored values using the following rules: a right-censored value is labeled adding the symbol ‘+’ at the end of the value, whereas the symbol ‘-’ is used for the left-censored values (see examples below). The summary statistics can be obtained using the method function ‘summary.datacggm’. The matrices Y and X are extracted from a datacggm object using the function ‘getMatrix’.

For each column of the matrix ‘Y’, mean and variance are estimated using a standard EM-algorithm based on the assumption of a Gaussian distribution. ‘maxit’ and ‘thr’ are used to set the number of iterations and the threshold for convergence, respectively. Marginal means and variances can be extracted using the accessor functions ‘ColMeans’ and ‘ColVars’, respectively. Furthermore, the plotting functions ‘hist.datacggm’ and ‘qqcnorm’ can be used to inspect the marginal distribution of each column of the matrix ‘Y’.

The status indicator matrix, denoted by R, can be extracted by using the function event. The entries of this matrix specify the status of an observation using the following code:

See below for the other functions related to an object of class ‘datacggm’.

Value

datacggm’ returns an R object of S3 class “datacggm”, that is, a nested named list containing the following components:

Y

the (n\times p)-dimensional matrix Y.

X

the (n\times q)-dimensional data frame X.

Info
  • lo: the lower censoring vector;

  • up: the upper censoring vector;

  • R: the status indicator matrix encoding the censored/missing values (mainly for internal purposes);

  • order: an integer vector used for the ordering of the matrices Y and X (for internal purposes only);

  • Pattern: a matrix encoding the information about the the patterns of censored/missing values (for internal purposes only);

  • ym: the estimated marginal means of the random variables Y_j;

  • yv: the estimated marginal variances of the random variables Y_j;

  • n: the sample size;

  • p: the number of response variables;

  • q: the number of columns of the data frame X.

Author(s)

Luigi Augugliaro (luigi.augugliaro@unipa.it)

References

Augugliaro L., Sottile G., Wit E.C., and Vinciotti V. (2023) <doi:10.18637/jss.v105.i01>. cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values. Journal of Statistical Software 105(1), 1–58.

Augugliaro, L., Sottile, G., and Vinciotti, V. (2020a) <doi:10.1007/s11222-020-09945-7>. The conditional censored graphical lasso estimator. Statistics and Computing 30, 1273–1289.

Augugliaro, L., Abbruzzo, A., and Vinciotti, V. (2020b) <doi:10.1093/biostatistics/kxy043>. \ell_1-Penalized censored Gaussian graphical model. Biostatistics 21, e1–e16.

See Also

Related to the R objects of class “datacggm” there are the accessor functions, rowNames, colNames, getMatrix, ColMeans, ColVars, upper, lower, event, qqcnorm and the method functions is.datacggm, dim.datacggm, summary.datacggm and hist.datacggm. The function rcggm can be used to simulate a dataset from a conditional Gaussian graphical model whereas cglasso is the model fitting function devoted to the l1-penalized censored Gaussian graphical model.

Examples

set.seed(123)

# a dataset from a right-censored Gaussian graphical model
n <- 100L
p <- 3L
Y <- matrix(rnorm(n * p), n, p)
up <- 1
Y[Y >= up] <- up
Z <- datacggm(Y = Y, up = up)
Z

# a dataset from a  conditional censored Gaussian graphical model
n <- 100L
p <- 3L
q <- 2
Y <- matrix(rnorm(n * p), n, p)
up <- 1
lo <- -1
Y[Y >= up] <- up
Y[Y <= lo] <- lo
X <- matrix(rnorm(n * q), n, q)
Z <- datacggm(Y = Y, lo = lo, up = up, X = X)
Z

# a dataset from a  conditional censored Gaussian graphical model 
# and with missing-at-random values
n <- 100L
p <- 3L
q <- 2
Y <- matrix(rnorm(n * p), n, p)
NA.id <- matrix(rbinom(n * p, 1L, 0.01), n, p)
Y[NA.id == 1L] <- NA
up <- 1
lo <- -1
Y[Y >= up] <- up
Y[Y <= lo] <- lo
X <- matrix(rnorm(n * q), n, q)
Z <- datacggm(Y = Y, lo = lo, up = up, X = X)
Z

[Package cglasso version 2.0.7 Index]