R: Fit LUCID models with one or multiple omics layers

estimate_lucid {LUCIDus}

R Documentation

Fit LUCID models with one or multiple omics layers

Description

EM algorithm to estimate LUCID with one or multiple omics layers

Usage

estimate_lucid(
  lucid_model = c("early", "parallel", "serial"),
  G,
  Z,
  Y,
  CoG = NULL,
  CoY = NULL,
  K,
  init_omic.data.model = "EEV",
  useY = TRUE,
  tol = 0.001,
  max_itr = 1000,
  max_tot.itr = 10000,
  Rho_G = 0,
  Rho_Z_Mu = 0,
  Rho_Z_Cov = 0,
  family = c("normal", "binary"),
  seed = 123,
  init_impute = c("mix", "lod"),
  init_par = c("mclust", "random"),
  verbose = FALSE
)

Arguments

`lucid_model`	Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial
`G`	an N by P matrix representing exposures
`Z`	Omics data, if "early", an N by M matrix; If "parallel", a list, each element i is a matrix with N rows and P_i features; If "serial", a list, each element i is a matrix with N rows and p_i features or a list with two or more matrices with N rows and a certain number of features
`Y`	a length N vector
`CoG`	an N by V matrix representing covariates to be adjusted for G -> X
`CoY`	an N by K matrix representing covariates to be adjusted for X -> Y
`K`	Number of latent clusters. If "early", an integer greater or equal to 2; If "parallel",an integer vector, same length as Z, with each element being an interger greater or equal to 2; If "serial", a list, each element is either an integer like that for "early" or an list of integers like that for "parallel", same length as Z
`init_omic.data.model`	a vector of strings specifies the geometric model of omics data. If NULL, See more in ?mclust::mclustModelNames
`useY`	logical, if TRUE, EM algorithm fits a supervised LUCID; otherwise unsupervised LUCID.
`tol`	stopping criterion for the EM algorithm
`max_itr`	Maximum iterations of the EM algorithm. If the EM algorithm iterates more than max_itr without converging, the EM algorithm is forced to stop.
`max_tot.itr`	Max number of total iterations for `estimate_lucid` function. `estimate_lucid` may conduct EM algorithm for multiple times if the algorithm fails to converge.
`Rho_G`	A scalar. This parameter is the LASSO penalty to regularize exposures. If user wants to tune the penalty, use the wrapper function `lucid`. Now only achieved for LUCID early integration.
`Rho_Z_Mu`	A scalar. This parameter is the LASSO penalty to regularize cluster-specific means for omics data (Z). If user wants to tune the penalty, use the wrapper function `lucid`.Now only achieved for LUCID early integration.
`Rho_Z_Cov`	A scalar. This parameter is the graphical LASSO penalty to estimate sparse cluster-specific variance-covariance matrices for omics data (Z). If user wants to tune the penalty, use the wrapper function `lucid`. Now only achieved for LUCID early integration.
`family`	The distribution of the outcome
`seed`	Random seed to initialize the EM algorithm
`init_impute`	Method to initialize the imputation of missing values in LUCID. `mix` will use `mclust:imputeData` to implement EM Algorithm for Unrestricted General Location Model by the mix package to impute the missing values in omics data; `lod` will initialize the imputation via replacing missing values by LOD / sqrt(2). LOD is determined by the minimum of each variable in omics data.
`init_par`	For "early", an interface to initialize EM algorithm, if mclust, initiate the parameters using the `mclust` package, if random, initiate the parameters by drawing from a uniform distribution; For "parallel", mclust is the default for quick convergence; For "serial", each sub-model follows the above depending on it is a "early" or "parallel"
`verbose`	A flag indicates whether detailed information for each iteration of EM algorithm is printed in console. Default is FALSE.

Value

A list contains the object below:

res_Beta: estimation for G->X associations
res_Mu: estimation for the mu of the X->Z associations
res_Sigma: estimation for the sigma of the X->Z associations
res_Gamma: estimation for X->Y associations
inclusion.p: inclusion probability of cluster assignment for each observation
K: umber of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"
var.names: names for the G, Z, Y variables
init_omic.data.model: pre-specified geometric model of multi-omics data
likelihood: converged LUCID model log likelihood
family: the distribution of the outcome
select: for LUCID early integration only, indicators of whether each exposure and omics feature is selected
useY: whether this LUCID model is supervised
Z: multi-omics data
init_impute: pre-specified imputation method
init_par: pre-specified parameter initialization method
Rho: for LUCID early integration only, pre-specified regularity tuning parameter
N: number of observations
submodel: for LUCID in serial only, storing all the submodels

Examples

i <- 1008
set.seed(i)
G <- matrix(rnorm(500), nrow = 100)
Z1 <- matrix(rnorm(1000),nrow = 100)
Z2 <- matrix(rnorm(1000), nrow = 100)
Z3 <- matrix(rnorm(1000), nrow = 100)
Z4 <- matrix(rnorm(1000), nrow = 100)
Z5 <- matrix(rnorm(1000), nrow = 100)
Z <- list(Z1 = Z1, Z2 = Z2, Z3 = Z3, Z4 = Z4, Z5 = Z5)
Y <- rnorm(100)
CoY <- matrix(rnorm(200), nrow = 100)
CoG <- matrix(rnorm(200), nrow = 100)
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y, K = list(2,2,2,2,2),
lucid_model = "serial",
family = "normal",
seed = i,
CoG = CoG, CoY = CoY,
useY = TRUE)

[Package LUCIDus version 3.0.2 Index]