PCLasso {PCLassoReg} | R Documentation |
Protein complex-based group lasso-Cox model
Description
Construct a PCLasso model based on a gene/protein expression matrix, survival data, and protein complexes.
Usage
PCLasso(
x,
y,
group,
penalty = c("grLasso", "grMCP", "grSCAD"),
standardize = TRUE,
...
)
Arguments
x |
A n x p matrix of gene/protein expression measurements with n samples and p genes/proteins. |
y |
The time-to-event outcome, as a two-column matrix or |
group |
A list of groups. The feature (gene/protein) names in
|
penalty |
The penalty to be applied to the model. For group selection,
one of grLasso, grMCP, or grSCAD. See |
standardize |
Logical flag for |
... |
Arguments to be passed to |
Details
The function PCLasso
implements the PCLasso model when the
parameter penalty
is set to "grLasso". The PCLasso model is a
prognostic model which selects important predictors at the protein complex
level to achieve accurate prognosis and identify risk protein complexes.
The PCLasso model has three inputs: a gene expression matrix, survival
data, and protein complexes. It estimates the correlation between gene
expression in protein complexes and survival data at the level of protein
complexes. Similar to the traditional Lasso-Cox model, PCLasso is based on
the Cox PH model and estimates the Cox regression coefficients by
maximizing partial likelihood with regularization penalty. The difference
is that PCLasso selects features at the level of protein complexes rather
than individual genes. Considering that genes usually function by forming
protein complexes, PCLasso regards genes belonging to the same protein
complex as a group, and constructs a l1/l2 penalty based on the sum (i.e.,
l1 norm) of the l2 norms of the regression coefficients of the group
members to perform the selection of features at the group level. Since a
gene may belong to multiple protein complexes, that is, there is overlap
between protein complexes, the classical group Lasso-Cox model for
non-overlapping groups may lead to false sparse solutions. The PCLasso
model deals with the overlapping problem of protein complexes by
constructing a latent group Lasso-Cox model. And by reconstructing the gene
expression matrix of the protein complexes, the latent group Lasso-Cox
model is transformed into a non-overlapping group Lasso-Cox model in an
expanded space, which can be directly solved using the classical group
Lasso method. Through the final sparse solution, we can predict the
patient's risk score based on a small set of protein complexes and identify
risk protein complexes that are frequently selected to construct prognostic
models. The penalty parameters grSCAD
and grMCP
can also be
used to identify survival-related risk protein complexes. Their penalty for
large coefficients is smaller than grLasso
, so they tend to choose
less risk protein complexes.
Value
An object with S3 class \code{PCLasso} containing:
fit |
An object of class |
complexes.dt |
Complexes with features (genes/proteins) not included
in |
References
PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
See Also
Examples
# load data
data(survivalData)
data(PCGroups)
x = survivalData$Exp
y = survivalData$survData
PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human",
Type = "EntrezID")
# fit PCLasso model
fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso")
# fit PCSCAD model
fit.PCSCAD <- PCLasso(x, y, group = PC.Human, penalty = "grSCAD")
# fit PCMCP model
fit.PCMCP <- PCLasso(x, y, group = PC.Human, penalty = "grMCP")