PCLasso {PCLassoReg}R Documentation

Protein complex-based group lasso-Cox model

Description

Construct a PCLasso model based on a gene/protein expression matrix, survival data, and protein complexes.

Usage

PCLasso(
  x,
  y,
  group,
  penalty = c("grLasso", "grMCP", "grSCAD"),
  standardize = TRUE,
  ...
)

Arguments

x

A n x p matrix of gene/protein expression measurements with n samples and p genes/proteins.

y

The time-to-event outcome, as a two-column matrix or Surv object. The first column should be time on study (follow up time); the second column should be a binary variable with 1 indicating that the event has occurred and 0 indicating (right) censoring.

group

A list of groups. The feature (gene/protein) names in group should be consistent with the feature (gene/protein) names in x.

penalty

The penalty to be applied to the model. For group selection, one of grLasso, grMCP, or grSCAD. See grpsurv in the R package grpreg for details.

standardize

Logical flag for x standardization, prior to fitting the model. Default is TRUE.

...

Arguments to be passed to grpsurv in the R package grpreg.

Details

The function PCLasso implements the PCLasso model when the parameter penalty is set to "grLasso". The PCLasso model is a prognostic model which selects important predictors at the protein complex level to achieve accurate prognosis and identify risk protein complexes. The PCLasso model has three inputs: a gene expression matrix, survival data, and protein complexes. It estimates the correlation between gene expression in protein complexes and survival data at the level of protein complexes. Similar to the traditional Lasso-Cox model, PCLasso is based on the Cox PH model and estimates the Cox regression coefficients by maximizing partial likelihood with regularization penalty. The difference is that PCLasso selects features at the level of protein complexes rather than individual genes. Considering that genes usually function by forming protein complexes, PCLasso regards genes belonging to the same protein complex as a group, and constructs a l1/l2 penalty based on the sum (i.e., l1 norm) of the l2 norms of the regression coefficients of the group members to perform the selection of features at the group level. Since a gene may belong to multiple protein complexes, that is, there is overlap between protein complexes, the classical group Lasso-Cox model for non-overlapping groups may lead to false sparse solutions. The PCLasso model deals with the overlapping problem of protein complexes by constructing a latent group Lasso-Cox model. And by reconstructing the gene expression matrix of the protein complexes, the latent group Lasso-Cox model is transformed into a non-overlapping group Lasso-Cox model in an expanded space, which can be directly solved using the classical group Lasso method. Through the final sparse solution, we can predict the patient's risk score based on a small set of protein complexes and identify risk protein complexes that are frequently selected to construct prognostic models. The penalty parameters grSCAD and grMCP can also be used to identify survival-related risk protein complexes. Their penalty for large coefficients is smaller than grLasso, so they tend to choose less risk protein complexes.

Value

An object with S3 class \code{PCLasso} containing:
fit

An object of class grpsurv

complexes.dt

Complexes with features (genes/proteins) not included in x being filtered out.

References

PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.

Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.

See Also

predict.PCLasso, cv.PCLasso

Examples

# load data
data(survivalData)
data(PCGroups)

x = survivalData$Exp
y = survivalData$survData

PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human",
Type = "EntrezID")

# fit PCLasso model
fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso")

# fit PCSCAD model
fit.PCSCAD <- PCLasso(x, y, group = PC.Human, penalty = "grSCAD")

# fit PCMCP model
fit.PCMCP <- PCLasso(x, y, group = PC.Human, penalty = "grMCP")

[Package PCLassoReg version 1.0.0 Index]