lm_filter {nestedcv} | R Documentation |
Linear model filter
Description
Linear models are fitted on each predictor, with inclusion of variable names
listed in force_vars
in the model. Predictors are ranked by Akaike
information criteria (AIC) value, or can be filtered by the p-value on the
estimate of the coefficient for that predictor in its model.
Usage
lm_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
rsq_method = "pearson",
type = c("index", "names", "full"),
keep_factors = TRUE,
method = 0L,
mc.cores = 1
)
Arguments
y |
Numeric or integer response vector |
x |
Matrix of predictors. If |
force_vars |
Vector of column names |
nfilter |
Number of predictors to return. If |
p_cutoff |
p-value cut-off. P-values are calculated by t-statistic on the estimated coefficient for the predictor being tested. |
rsq_cutoff |
r^2 cutoff for removing predictors due to collinearity.
Default |
rsq_method |
character string indicating which correlation coefficient
is to be computed. One of "pearson" (default), "kendall", or "spearman".
See |
type |
Type of vector returned. Default "index" returns indices, "names" returns predictor names, "full" returns a matrix of p values. |
keep_factors |
Logical affecting factors with 3 or more levels.
Dataframes are coerced to a matrix using data.matrix. Binary
factors are converted to numeric values 0/1 and analysed as such. If
|
method |
Integer determining linear model method. See
|
mc.cores |
Number of cores for parallelisation using
|
Details
This filter is based on the model y ~ xvar + force_vars
where y
is the
response vector, xvar
are variables in columns taken sequentially from x
and force_vars
are optional covariates extracted from x
. It uses
RcppEigen::fastLmPure()
with method = 0
as default since it is
rank-revealing. method = 3
is significantly faster but can give errors in
estimation of p-value with variables of zero variance. The algorithm attempts
to detect these and set their stats to NA
. NA
in x
are not tolerated.
Parallelisation is available via mclapply()
. This is provided mainly for
the use case of the filter being used as standalone. Nesting parallelisation
inside of parallelised nestcv.glmnet()
or nestcv.train()
loops is not
recommended.
Value
Integer vector of indices of filtered parameters (type = "index"
)
or character vector of names (type = "names"
) of filtered parameters in
order of linear model AIC. Any variables in force_vars
which are
incorporated into all models are listed first. If type = "full"
a matrix
of AIC value, sigma (residual standard error, see summary.lm),
coefficient, t-statistic and p-value for each tested predictor is returned.