R: Fitting penalized frailty models with clustered, multi-event...

frailtyMMpen {frailtyMMpen}

R Documentation

Fitting penalized frailty models with clustered, multi-event and recurrent data using MM algorithm

Description

This formula is used to fit the penalized regression. 3 types of the models can be fitted similar to the function frailtyMM. In addition, variable selection can be done by three types of penalty, LASSO, MCP and SCAD with the following objective function where \lambda is the tuning parameter and q is the dimension of \boldsymbol{\beta},

l(\boldsymbol{\beta},\Lambda_0|Y_{obs}) - n\sum_{p=1}^{q} p(|\beta_p|, \lambda).

The BIC is computed using the following equation,

-2l(\hat{\boldsymbol{\beta}}, \hat{\Lambda}_0) + G_n(\hat{S}+1)\log(n),

where G_n=\max\{1, \log(\log(q+1))\} and \hat{S} is the degree of freedom.

Surrogate function is also derived for penalty part for efficient estimation of penalized regression, similar to the notation used in frailtyMM, we let \boldsymbol{\alpha} be the collection of all parameters and baseline hazard function. Given that,

by local quadratic approximation,

And thus, the surrogate function given k^{th} iteration result is as follows,

Usage

frailtyMMpen(
  formula,
  data,
  frailty = "gamma",
  power = NULL,
  penalty = "LASSO",
  gam = NULL,
  tune = NULL,
  tol = 1e-05,
  maxit = 200,
  ...
)

Arguments

`formula`	Formula where the left hand side is an object of the type `Surv` and the right hand side contains the variables and additional specifications. `+cluster()` function specify the group id for clustered data or individual id for recurrent data. `+event()` function specify the event id for multi-event data (only two events are allowed).
`data`	The `data.frame` where the formula argument can be evaluated.
`frailty`	The frailty used for model fitting. The default is "lognormal", other choices are "invgauss", "gamma" and "pvf". (Note that the computation time for PVF family will be slow due to the non-explicit expression of likelihood function)
`power`	The power used if PVF frailty is applied.
`penalty`	The penalty used for regularization, the default is "LASSO", other choices are "MCP" and "SCAD".
`gam`	The tuning parameter for MCP and SCAD which controls the concavity of the penalty. For MCP, `p^{\prime}(\beta, \lambda)=sign(\beta)(\lambda - \frac{\|\beta\|}{\gamma})` and for "SCAD", `p^{\prime}(\beta, \lambda)=\lambda\{I(\|\beta\| \leq \lambda)+\frac{(\gamma \lambda-\|\beta\|)_{+}}{(\gamma-1) \lambda} I(\|\beta\|>\lambda)\}.` The default value of `\gamma` for MCP is 3 and SCAD is 3.7.
`tune`	The sequence of tuning parameters provided by user. If not provided, the default grid will be applied.
`tol`	The tolerance level for convergence.
`maxit`	Maximum iterations for MM algorithm.
`...`	additional arguments pass to the function.

Details

Without a given tune, the default sequence of tuning parameters are used to provide the regularization path. The formula is same as the input for function frailtyMM.

Value

An object of class fmm that contains the following fields:

`coef`	matrix of coefficient estimated from a specific model where each column correponds to an input tuning parameter.
`est.tht`	vector of frailty parameters estimated from a specific model with respect to each tuning parameter.
`lambda`	list of frailty for each observation estimated from a specific model with respect to each tuning parameter.
`likelihood`	vector of the observed log-likelihood given estimated parameters with respect to each tuning parameter.
`BIC`	vector of the BIC given estimated parameters with respect to each tuning parameter.
`tune`	vector of tuning parameters used for penalized regression.
`tune.min`	tuning parameter where minimal of BIC is obtained.
`convergence`	convergence threshold.
`input`	The input data re-ordered by cluster id. `y` is the event time, `X` is covariate matrix and `d` is the status while 0 indicates censoring.
`y`	input stopping time.
`X`	input covariate matrix.
`d`	input censoring indicator.
`formula`	formula applied as input.
`coefname`	name of each coefficient from input.
`id`	id for individuals or clusters, 1,2...,a. Note that, since the original id may not be the sequence starting from 1, this output id may not be identical to the original id. Also, the order of id is corresponding to the returned `input`.
`N`	total number of observations.
`a`	total number of individuals or clusters.
`datatype`	model used for fitting.

References

Huang, X., Xu, J. and Zhou, Y. (2022). Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics, 10(4), 538.
Huang, X., Xu, J. and Zhou, Y. (2023). Efficient algorithms for survival data with multiple outcomes using the frailty model. Statistical Methods in Medical Research, 32(1), 118-132.

Examples


data(simdataCL)

# Penalized regression under clustered frailty model

# Clustered Gamma Frailty Model

# Using default tuning parameter sequence
gam_cl1 = frailtyMMpen(Surv(time, status) ~ . + cluster(id),
                       simdataCL, frailty = "gamma")


# Using given tuning parameter sequence
gam_cl2 = frailtyMMpen(Surv(time, status) ~ . + cluster(id), 
                       simdataCL, frailty = "gamma", tune = 0.1)

# Obtain the coefficient where minimum BIC is obtained
coef(gam_cl1)

# Obtain the coefficient with tune = 0.2.
coef(gam_cl1, tune = 0.2)

# Plot the regularization path
plot(gam_cl1)

# Get the degree of freedom and BIC for the sequence of tuning parameters provided
print(gam_cl1)

[Package frailtyMMpen version 1.2.1 Index]