sparseRBIC_step {sparseR} | R Documentation |
Fit a ranked-sparsity model with forward stepwise RBIC (experimental)
Description
Fit a ranked-sparsity model with forward stepwise RBIC (experimental)
Usage
sparseRBIC_step(
formula,
data,
family = c("gaussian", "binomial", "poisson"),
k = 1,
poly = 1,
ic = c("RBIC", "RAIC", "BIC", "AIC", "EBIC"),
hier = c("strong", "weak", "none"),
sequential = (hier[1] != "none"),
cumulative_k = FALSE,
cumulative_poly = TRUE,
pool = FALSE,
ia_formula = NULL,
pre_process = TRUE,
model_matrix = NULL,
y = NULL,
poly_prefix = "_poly_",
int_sep = "\\:",
pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
filter = c("nzv", "zv"),
extra_opts = list(),
trace = 0,
message = TRUE,
...
)
Arguments
formula |
Names of the terms |
data |
Data |
family |
The family of the model |
k |
The maximum order of interactions to consider |
poly |
The maximum order of polynomials to consider |
ic |
The information criterion to use |
hier |
Should hierarchy be enforced (weak or strong)? Must be set with sequential == TRUE (see details) |
sequential |
Should the main effects be considered first, orders sequentially added/considered? |
cumulative_k |
Should penalties be increased cumulatively as order interaction increases? |
cumulative_poly |
Should penalties be increased cumulatively as order polynomial increases? |
pool |
Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty? |
ia_formula |
formula to be passed to step_interact via terms argument |
pre_process |
Should the data be preprocessed (if FALSE, must provide model_matrix) |
model_matrix |
A data frame or matrix specifying the full model matrix (used if !pre_process) |
y |
A vector of responses (used if !pre_process) |
poly_prefix |
If model_matrix is specified, what is the prefix for polynomial terms? |
int_sep |
If model_matrix is specified, what is the separator for interaction terms? |
pre_proc_opts |
List of preprocessing steps (see details) |
filter |
The type of filter applied to main effects + interactions |
extra_opts |
A list of options for all preprocess steps (see details) |
trace |
Should intermediate results of model selection process be output |
message |
should experimental message be suppressed |
... |
additional arguments for running stepwise selection |
Details
This function mirrors sparseR
but uses stepwise selection guided by RBIC.
Additionally, setting cumulative_poly
or cumulative_k
to TRUE
increases
the penalty cumulatively based on the order of either polynomial or
interaction.
The hier
hierarchy enforcement will only work if sequential == TRUE
, and
notably will only consider the "first gen" hierarchy, that is, that all
main effects which make up an interaction are already in the model. It
is therefore possible for a third order interaction (x1:x2:x3) to
enter a model without x1:x2 or x2:x3, so long as x1, x2, and x3 are all
in the model.
The options that can be passed to pre_proc_opts
are:
knnImpute (should missing data be imputed?)
scale (should data be standardized)?
center (should data be centered to the mean or another value?)
otherbin (should factors with low prevalence be combined?)
none (should no preprocessing be done? can also specify a null object)
The options that can be passed to extra_opts
are:
centers (named numeric vector which denotes where each covariate should be centered)
center_fn (alternatively, a function can be specified to calculate center such as
min
ormedian
)freq_cut, unique_cut (see ?step_nzv - these get used by the filtering steps)
neighbors (the number of neighbors for knnImpute)
one_hot (see ?step_dummy), this defaults to cell-means coding which can be done in regularized regression (change at your own risk)
raw (should polynomials not be orthogonal? defaults to true because variables are centered and scaled already by this point by default)
Value
an object of class sparseRBIC
containing the following:
fit |
the final fit object |
srprep |
a |
pen_info |
coefficient-level variable counts, types + names |
data |
the (unprocessed) data |
family |
the family argument (for non-normal, eg. poisson) |
info |
a list containing meta-info about the procedure |
stats |
the IC for each fit and respective terms included |