stabsel {mboost} | R Documentation |
Stability Selection
Selection of influential variables or model components with error control.
## a method to compute stability selection paths for fitted mboost models
## S3 method for class 'mboost'
stabsel(x, cutoff, q, PFER, grid = 0:mstop(x),
folds = subsample(model.weights(x), B = B),
B = ifelse(sampling.type == "MB", 100, 50),
assumption = c("unimodal", "r-concave", "none"),
sampling.type = c("SS", "MB"),
papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...)
## just a wrapper to stabsel(p, ..., eval = FALSE)
## S3 method for class 'mboost'
stabsel_parameters(p, ...)
x , p |
an fitted model of class |
cutoff |
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used. |
q |
number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample. |
upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details. |
grid |
a numeric vector of the form |
folds |
a weight matrix with number of rows equal to the number
of observations, see |
assumption |
Defines the type of assumptions on the
distributions of the selection probabilities and simultaneous
selection probabilities. Only applicable for
sampling.type |
use sampling scheme of of Shah & Samworth
(2013), i.e., with complementarty pairs ( |
B |
number of subsampling replicates. Per default, we use 50
complementary pairs for the error bounds of Shah & Samworth (2013)
and 100 for the error bound derived in Meinshausen & Buehlmann
(2010). As we use |
papply |
(parallel) apply function, defaults to
verbose |
logical (default: |
deprecated. Only for compatibility with older versions, use PFER instead. |
eval |
logical. Determines whether stability selection is
evaluated ( |
... |
additional arguments to parallel apply methods such as
For details see stabsel
in package stabs
and Hofner et al. (2015).
An object of class stabsel
with a special print
The object has the following elements:
phat |
selection probabilities. |
selected |
elements with maximal selection probability greater
max |
maximum of selection probabilities. |
cutoff |
cutoff used. |
q |
average number of selected variables used. |
per-family error rate. |
sampling.type |
the sampling type used for stability selection. |
assumption |
the assumptions made on the selection probabilities. |
call |
the call. |
B. Hofner, L. Boccuto and M. Goeker (2015), Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16:144.
N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473.
R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80.
See Also
## make data set available
data("bodyfat", package = "")
## set seed
### low-dimensional example
mod <- glmboost(DEXfat ~ ., data = bodyfat)
## compute cutoff ahead of running stabsel to see if it is a sensible
## parameter choice.
## p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept)
stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1,
sampling.type = "MB")
## the same:
stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)
## Not run: ############################################################
## Do not run and check these examples automatically as
## they take some time (~ 10 seconds depending on the system)
## now run stability selection
(sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB"))
opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
plot(sbody, type = "maxsel", ymargin = 6)
## End(Not run and test)
## End(Not run)