CV.CMB3S {gfboost} | R Documentation |
Cross-validated version of CMB-3S
Description
Cross-validates the whole loss-based Stability Selection by aggregating several stable models according to their performance on validation sets. Also computes a cross-validated test loss on a disjoint test set.
Usage
CV.CMB3S(
D,
nsing,
Bsing = 1,
B = 100,
alpha = 1,
singfam = Gaussian(),
evalfam = Gaussian(),
sing = FALSE,
M = 10,
m_iter = 100,
kap = 0.1,
LS = FALSE,
best = 1,
wagg,
gridtype,
grid,
ncmb,
CVind,
targetfam = Gaussian(),
print = TRUE,
robagg = FALSE,
lower = 0,
singcoef = FALSE,
Mfinal = 10,
...
)
Arguments
D |
Data matrix. Has to be an |
nsing |
Number of observations (rows) used for the SingBoost submodels. |
Bsing |
Number of subsamples based on which the SingBoost models are validated. Default is 1. Not to confuse with parameter |
B |
Number of subsamples based on which the CMB models are validated. Default is 100. Not to confuse with |
alpha |
Optional real number in |
singfam |
A SingBoost family. The SingBoost models are trained based on the corresponding loss function. Default is |
evalfam |
A SingBoost family. The SingBoost models are validated according to the corresponding loss function. Default is |
sing |
If |
M |
An integer between 2 and |
m_iter |
Number of SingBoost iterations. Default is 100. |
kap |
Learning rate (step size). Must be a real number in |
LS |
If a |
best |
Needed in the case of localized ranking. The parameter |
wagg |
Type of row weight aggregation. |
gridtype |
Choose between |
grid |
The grid for the thresholds (in |
ncmb |
Number of samples used for |
CVind |
A list where each element contains a vector on length |
targetfam |
Target loss. Should be the same family as |
print |
If set to |
robagg |
Optional. If setting |
lower |
Optional argument. Only reasonable when setting |
singcoef |
Default is |
Mfinal |
Optional. Necessary if |
... |
Optional further arguments |
Details
In CMB3S
, a validation set is given based on which the optimal stable model is chosen. The CV.CMB3S
function adds an outer cross-validation step such that both the training and the validation data sets (and
optionally the test data sets) are chosen randomly by disjointly dividing the initial data set. The aggregated
stable models form an ”ultra-stable” model. It is strongly recommended to use this function is a parallelized
manner due to huge computation time.
Value
Cross-validated loss |
A vector containing the cross-validated test losses. |
Ultra-stable column measure |
A vector containing the aggregated selection frequencies of the stable models. |
References
Werner, T., Gradient-Free Gradient Boosting, PhD Thesis, Carl von Ossietzky University Oldenburg, 2020