CMB3S {gfboost} | R Documentation |
Column Measure Boosting with SingBoost and Stability Selection (CMB-3S)
Description
Executes CMB and the loss-based Stability Selection.
Usage
CMB3S(
Dtrain,
nsing,
Bsing = 1,
B = 100,
alpha = 1,
singfam = Gaussian(),
evalfam = Gaussian(),
sing = FALSE,
M = 10,
m_iter = 100,
kap = 0.1,
LS = FALSE,
best = 1,
wagg,
gridtype,
grid,
Dvalid,
ncmb,
robagg = FALSE,
lower = 0,
singcoef = FALSE,
Mfinal = 10,
...
)
Arguments
Dtrain |
Data matrix. Has to be an |
nsing |
Number of observations (rows) used for the SingBoost submodels. |
Bsing |
Number of subsamples based on which the SingBoost models are validated. Default is 1. Not to confuse with parameter |
B |
Number of subsamples based on which the CMB models are validated. Default is 100. Not to confuse with |
alpha |
Optional real number in |
singfam |
A SingBoost family. The SingBoost models are trained based on the corresponding loss function. Default is |
evalfam |
A SingBoost family. The SingBoost models are validated according to the corresponding loss function. Default is |
sing |
If |
M |
An integer between 2 and |
m_iter |
Number of SingBoost iterations. Default is 100. |
kap |
Learning rate (step size). Must be a real number in |
LS |
If a |
best |
Needed in the case of localized ranking. The parameter |
wagg |
Type of row weight aggregation. |
gridtype |
Choose between |
grid |
The grid for the thresholds (in |
Dvalid |
Validation data for selecting the optimal element of the grid and with it the best corresponding model. |
ncmb |
Number of samples used for |
robagg |
Optional. If setting |
lower |
Optional argument. Only reasonable when setting |
singcoef |
Default is |
Mfinal |
Optional. Necessary if |
... |
Optional further arguments |
Details
See CMB
and CMB.Stabsel
.
Value
Final coefficients |
The coefficients corresponding to the optimal stable model as a vector. |
Stable column measure |
Aggregated empirical column measure (i.e., selection frequencies) as a vector. |
Selected columns |
The column numbers of the variables that form the best stable model as a vector. |
Used row measure |
Aggregated empirical row measure (i.e., row weights) as a vector. |
References
Werner, T., Gradient-Free Gradient Boosting, PhD Thesis, Carl von Ossietzky University Oldenburg, 2020
T. Hothorn, P. Bühlmann, T. Kneib, M. Schmid, and B. Hofner. mboost: Model-Based Boosting, 2017
B. Hofner and T. Hothorn. stabs: Stability Selection with Error Control, 2017.
B. Hofner, L. Boccuto, and M. Göker. Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16(1):144, 2015.
N. Meinshausen and P. Bühlmann. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417–473, 2010.
Examples
firis<-as.formula(Sepal.Length~.)
Xiris<-model.matrix(firis,iris)
Diris<-data.frame(Xiris[,-1],iris$Sepal.Length)
colnames(Diris)[6]<-"Y"
set.seed(19931023)
ind<-sample(1:150,120,replace=FALSE)
Dtrain<-Diris[ind,]
Dvalid<-Diris[-ind,]
set.seed(19931023)
cmb3s<-CMB3S(Dtrain,nsing=120,Dvalid=Dvalid,ncmb=120,Bsing=1,B=1,alpha=1,singfam=Gaussian()
,evalfam=Gaussian(),sing=FALSE,M=10,m_iter=100,kap=0.1,LS=FALSE,wagg='weights1',
gridtype='pigrid',grid=seq(0.8,0.9,1),robagg=FALSE,lower=0,singcoef=TRUE,Mfinal=10)
cmb3s$Fin
cmb3s$Stab
cmb3s$Sel
glmres4<-glmboost(Sepal.Length~.,iris[ind,])
coef(glmres4)
set.seed(19931023)
cmb3s1<-CMB3S(Dtrain,nsing=80,Dvalid=Dvalid,ncmb=100,Bsing=10,B=100,alpha=0.5,singfam=Gaussian(),
evalfam=Gaussian(),sing=FALSE,M=10,m_iter=100,kap=0.1,LS=FALSE,wagg='weights1',gridtype='pigrid',
grid=seq(0.8,0.9,1),robagg=FALSE,lower=0,singcoef=TRUE,Mfinal=10)
cmb3s1$Fin
cmb3s1$Stab
## This will may take around a minute
set.seed(19931023)
cmb3s2<-CMB3S(Dtrain,nsing=80,Dvalid=Dvalid,ncmb=100,Bsing=10,B=100,alpha=0.5,singfam=Rank(),
evalfam=Rank(),sing=TRUE,M=10,m_iter=100,kap=0.1,LS=TRUE,wagg='weights2',gridtype='pigrid',
grid=seq(0.8,0.9,1),robagg=FALSE,lower=0,singcoef=TRUE,Mfinal=10)
cmb3s2$Fin
cmb3s2$Stab
set.seed(19931023)
cmb3s3<-CMB3S(Dtrain,nsing=80,Dvalid=Dvalid,ncmb=100,Bsing=10,B=100,alpha=0.5,singfam=Huber(),
evalfam=Huber(),sing=FALSE,M=10,m_iter=100,kap=0.1,LS=FALSE,wagg='weights2',gridtype='pigrid',
grid=seq(0.8,0.9,1),robagg=FALSE,lower=0,singcoef=FALSE,Mfinal=10)
cmb3s3$Fin
cmb3s3$Stab