CMB {gfboost} | R Documentation |
CMB aggregation function
Description
Aggregates the selection frequencies of multiple SingBoost models. May be used with caution since there are not yet recommendations about good hyperparameters.
Usage
CMB(
D,
nsing,
Bsing = 1,
alpha = 1,
singfam = Gaussian(),
evalfam = Gaussian(),
sing = FALSE,
M = 10,
m_iter = 100,
kap = 0.1,
LS = FALSE,
best = 1,
wagg,
robagg = FALSE,
lower = 0,
...
)
Arguments
D |
Data matrix. Has to be an |
nsing |
Number of observations (rows) used for the SingBoost submodels. |
Bsing |
Number of subsamples based on which the SingBoost models are validated. Default is 1. Not to confuse with parameter |
alpha |
Optional real number in |
singfam |
A SingBoost family. The SingBoost models are trained based on the corresponding loss function. Default is |
evalfam |
A SingBoost family. The SingBoost models are validated according to the corresponding loss function. Default is |
sing |
If |
M |
An integer between 2 and |
m_iter |
Number of SingBoost iterations. Default is 100. |
kap |
Learning rate (step size). Must be a real number in |
LS |
If a |
best |
Needed in the case of localized ranking. The parameter |
wagg |
Type of row weight aggregation. |
robagg |
Optional. If setting |
lower |
Optional argument. Only reasonable when setting |
... |
Optional further arguments |
Details
SingBoost is designed to detect variables that standard Boosting procedures may not but which may be relevant w.r.t. the target loss function. However, one may try to stabilize this ”singular part” of the column measure by aggregating several SingBoost models in the sense that they are evaluated on a validation set and that the selection frequencies are averaged, maybe in a weighted manner according to the validation losses. Warning: This procedure does not replace a Stability Selection!
Value
Column measure |
Aggregated column measure as |
Selected variables |
Names of the variables with positive aggregated column measure. |
Variables names |
Names of all variables including the intercept. |
Row measure |
Aggregated row measure as |
References
Werner, T., Gradient-Free Gradient Boosting, PhD Thesis, Carl von Ossietzky University Oldenburg, 2020
Examples
firis<-as.formula(Sepal.Length~.)
Xiris<-model.matrix(firis,iris)
Diris<-data.frame(Xiris[,-1],iris$Sepal.Length)
colnames(Diris)[6]<-"Y"
set.seed(19931023)
cmb1<-CMB(Diris,nsing=100,Bsing=50,alpha=0.8,singfam=Rank(),
evalfam=Rank(),sing=TRUE,M=10,m_iter=100,
kap=0.1,LS=TRUE,wagg='weights1',robagg=FALSE,lower=0)
cmb1
set.seed(19931023)
cmb2<-CMB(Diris,nsing=100,Bsing=50,alpha=0.8,singfam=Rank(),
evalfam=Rank(),sing=TRUE,M=2,m_iter=100,
kap=0.1,LS=TRUE,wagg='weights1',robagg=FALSE,lower=0)
cmb2[[1]]
set.seed(19931023)
cmb3<-CMB(Diris,nsing=100,Bsing=50,alpha=0.8,singfam=Rank(),
evalfam=Rank(),sing=TRUE,M=10,m_iter=100,
kap=0.1,LS=TRUE,wagg='weights2',robagg=FALSE,lower=0)
cmb3[[1]]