frailtyMMpen {frailtyMMpen}R Documentation

Fitting penalized frailty models with clustered, multi-event and recurrent data using MM algorithm

Description

This formula is used to fit the penalized regression. 3 types of the models can be fitted similar to the function frailtyMM. In addition, variable selection can be done by three types of penalty, LASSO, MCP and SCAD with the following objective function where \lambda is the tuning parameter and q is the dimension of \boldsymbol{\beta},

l(\boldsymbol{\beta},\Lambda_0|Y_{obs}) - n\sum_{p=1}^{q} p(|\beta_p|, \lambda).

The BIC is computed using the following equation,

-2l(\hat{\boldsymbol{\beta}}, \hat{\Lambda}_0) + G_n(\hat{S}+1)\log(n),

where G_n=\max\{1, \log(\log(q+1))\} and \hat{S} is the degree of freedom.

Surrogate function is also derived for penalty part for efficient estimation of penalized regression, similar to the notation used in frailtyMM, we let \boldsymbol{\alpha} be the collection of all parameters and baseline hazard function. Given that,

fig15.png

by local quadratic approximation,

fig16.png

And thus, the surrogate function given k^{th} iteration result is as follows,

fig17.png

Usage

frailtyMMpen(
  formula,
  data,
  frailty = "gamma",
  power = NULL,
  penalty = "LASSO",
  gam = NULL,
  tune = NULL,
  tol = 1e-05,
  maxit = 200,
  ...
)

Arguments

formula

Formula where the left hand side is an object of the type Surv and the right hand side contains the variables and additional specifications. +cluster() function specify the group id for clustered data or individual id for recurrent data. +event() function specify the event id for multi-event data (only two events are allowed).

data

The data.frame where the formula argument can be evaluated.

frailty

The frailty used for model fitting. The default is "lognormal", other choices are "invgauss", "gamma" and "pvf". (Note that the computation time for PVF family will be slow due to the non-explicit expression of likelihood function)

power

The power used if PVF frailty is applied.

penalty

The penalty used for regularization, the default is "LASSO", other choices are "MCP" and "SCAD".

gam

The tuning parameter for MCP and SCAD which controls the concavity of the penalty. For MCP,

p^{\prime}(\beta, \lambda)=sign(\beta)(\lambda - \frac{|\beta|}{\gamma})

and for "SCAD",

p^{\prime}(\beta, \lambda)=\lambda\{I(|\beta| \leq \lambda)+\frac{(\gamma \lambda-|\beta|)_{+}}{(\gamma-1) \lambda} I(|\beta|>\lambda)\}.

The default value of \gamma for MCP is 3 and SCAD is 3.7.

tune

The sequence of tuning parameters provided by user. If not provided, the default grid will be applied.

tol

The tolerance level for convergence.

maxit

Maximum iterations for MM algorithm.

...

additional arguments pass to the function.

Details

Without a given tune, the default sequence of tuning parameters are used to provide the regularization path. The formula is same as the input for function frailtyMM.

Value

An object of class fmm that contains the following fields:

coef

matrix of coefficient estimated from a specific model where each column correponds to an input tuning parameter.

est.tht

vector of frailty parameters estimated from a specific model with respect to each tuning parameter.

lambda

list of frailty for each observation estimated from a specific model with respect to each tuning parameter.

likelihood

vector of the observed log-likelihood given estimated parameters with respect to each tuning parameter.

BIC

vector of the BIC given estimated parameters with respect to each tuning parameter.

tune

vector of tuning parameters used for penalized regression.

tune.min

tuning parameter where minimal of BIC is obtained.

convergence

convergence threshold.

input

The input data re-ordered by cluster id. y is the event time, X is covariate matrix and d is the status while 0 indicates censoring.

y

input stopping time.

X

input covariate matrix.

d

input censoring indicator.

formula

formula applied as input.

coefname

name of each coefficient from input.

id

id for individuals or clusters, 1,2...,a. Note that, since the original id may not be the sequence starting from 1, this output id may not be identical to the original id. Also, the order of id is corresponding to the returned input.

N

total number of observations.

a

total number of individuals or clusters.

datatype

model used for fitting.

References

See Also

frailtyMM

Examples


data(simdataCL)

# Penalized regression under clustered frailty model

# Clustered Gamma Frailty Model

# Using default tuning parameter sequence
gam_cl1 = frailtyMMpen(Surv(time, status) ~ . + cluster(id),
                       simdataCL, frailty = "gamma")


# Using given tuning parameter sequence
gam_cl2 = frailtyMMpen(Surv(time, status) ~ . + cluster(id), 
                       simdataCL, frailty = "gamma", tune = 0.1)

# Obtain the coefficient where minimum BIC is obtained
coef(gam_cl1)

# Obtain the coefficient with tune = 0.2.
coef(gam_cl1, tune = 0.2)

# Plot the regularization path
plot(gam_cl1)

# Get the degree of freedom and BIC for the sequence of tuning parameters provided
print(gam_cl1)




[Package frailtyMMpen version 1.2.1 Index]