model.formulas.update {CICI} | R Documentation |
Update model formulas based on variable screening
Description
Wrapper function to facilitate variable screening on all models generated through make.model.formulas
and return updated formulas in the appropriate format for gformula
.
Usage
model.formulas.update(formulas, X, screening = screen.glmnet.cramer,
with.s = FALSE, by= NA, ...)
Arguments
formulas |
A named list of length 4 containing model formulas for all Y-/L-/A- and Cnodes. These are likely formulas returned from |
X |
A data frame on which the model formulas are to be evaluated. |
screening |
A screening function. Default is |
with.s |
Logical. If TRUE, a spline, i.e. s(), will be added to all continuous variables. |
by |
A character vector specifying the variables with which to multiply the smooth (if |
... |
optional arguments to be passed to the screening algorithm |
Details
The default screening algorithm uses LASSO for variable screening (and Cramer's V for the categorized version of all variables if LASSO fails). It is possible to provide user-specific screening algorithms.
User-specific algorithms should take the data as first argument, one model formula (i.e. one entry of the list in model.formulas
) as second argument and return a vector of strings, containing the variable names that remain after screening. Another screening algorithm available in the package is screen.cramersv
, which categorizes all variables, calculates their association with the outcome based on Cramer's V and selects the 4 variables with strongest associations (can be changed with option nscreen
).
The manual provides more information.
The fitted models of the updated models can be evaluated with fit.updated.formulas
.
Value
A list of length 4 containing the updated model formulas:
Lnames |
A vector of strings containing updated model formulas for all L nodes. |
Ynames |
A vector of strings containing updated model formulas for all Y nodes. |
Anames |
A vector of strings containing updated model formulas for all A nodes. |
Cnames |
A vector of strings containing updated model formulas for all C nodes. |
See Also
make.model.formulas
, model.update
, fit.updated.formulas
Examples
data(EFV)
# first: generate generic model formulas
m <- make.model.formulas(X=EFV,
Lnodes = c("adherence.1","weight.1",
"adherence.2","weight.2",
"adherence.3","weight.3",
"adherence.4","weight.4"
),
Ynodes = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
Anodes = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
evaluate=FALSE)
# second: update these model formulas based on variable screening with LASSO
glmnet.formulas <- model.formulas.update(m$model.names, EFV)
glmnet.formulas
# third: use these models for estimation
est <- gformula(X=EFV,
Lnodes = c("adherence.1","weight.1",
"adherence.2","weight.2",
"adherence.3","weight.3",
"adherence.4","weight.4"
),
Ynodes = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
Anodes = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
Yform=glmnet.formulas$Ynames, Lform=glmnet.formulas$Lnames,
abar=seq(0,2,1)
)
est