rboost {ranktreeEnsemble} | R Documentation |
Generalized Boosted Modeling via Rank-Based Trees for Single Sample Classification with Gene Expression Profiles
Description
The function fits generalized boosted models via Rank-Based Trees on both binary and multi-class problems. It converts continuous gene expression profiles into
ranked gene pairs, for which the variable importance indices are computed and adopted for dimension reduction. The boosting implementation was directly imported from the gbm
package. For technical details, see the
vignette: utils::browseVignettes("gbm")
.
Usage
rboost(
formula,
data,
dimreduce = TRUE,
datrank = TRUE,
distribution = "multinomial",
weights,
ntree = 100,
nodedepth = 3,
nodesize = 5,
shrinkage = 0.05,
bag.fraction = 0.5,
train.fraction = 1,
cv.folds = 5,
keep.data = TRUE,
verbose = TRUE,
class.stratify.cv = TRUE,
n.cores = NULL
)
Arguments
formula |
Object of class 'formula' describing the model to fit. |
data |
Data frame containing the y-outcome and x-variables. |
dimreduce |
Dimension reduction via variable importance weighted forests. |
datrank |
If using ranked raw data for fitting the dimension reduction model. |
distribution |
Either a character string specifying the name of the
distribution to use: if the response has only 2 unique values,
|
weights |
an optional vector of weights to be used in the fitting process. It must be positive but does not need to be normalized. |
ntree |
Integer specifying the total number of trees to fit. This is
equivalent to the number of iterations and the number of basis functions in
the additive expansion, which matches |
nodedepth |
Integer specifying the maximum depth of each tree. A value of 1
implies an additive model. This matches |
nodesize |
Integer specifying the minimum number of observations
in the terminal nodes of the trees, which matches |
shrinkage |
a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is 0.05. |
bag.fraction |
the fraction of the training set observations randomly
selected to propose the next tree in the expansion. This introduces
randomnesses into the model fit. If |
train.fraction |
The first |
cv.folds |
Number of cross-validation folds to perform. If
|
keep.data |
a logical variable indicating whether to keep the data and
an index of the data stored with the object. Keeping the data and index
makes subsequent calls to |
verbose |
Logical indicating whether or not to print out progress and
performance indicators ( |
class.stratify.cv |
Logical indicating whether or not the cross-validation should be stratified by class. The purpose of stratifying the cross-validation is to help avoid situations in which training sets do not contain all classes. |
n.cores |
The number of CPU cores to use. The cross-validation loop
will attempt to send different CV folds off to different cores. If
|
Value
fit |
A vector containing the fitted values on the scale of regression function (e.g. log-odds scale for bernoulli). |
train.error |
A vector of length equal to the number of fitted trees containing the value of the loss function for each boosting iteration evaluated on the training data. |
valid.error |
A vector of length equal to the number of fitted trees containing the value of the loss function for each boosting iteration evaluated on the validation data. |
cv.error |
If |
oobag.improve |
A vector of
length equal to the number of fitted trees containing an out-of-bag estimate
of the marginal reduction in the expected value of the loss function. The
out-of-bag estimate uses only the training data and is useful for estimating
the optimal number of boosting iterations. See |
cv.fitted |
If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds. |
Author(s)
Ruijie Yin (Maintainer,<ruijieyin428@gmail.com>), Chen Ye and Min Lu
References
Lu M. Yin R. and Chen X.S. Ensemble Methods of Rank-Based Trees for Single Sample Classification with Gene Expression Profiles. Journal of Translational Medicine. 22, 140 (2024). doi: 10.1186/s12967-024-04940-2
Examples
data(tnbc)
obj <- rboost(subtype~., data = tnbc[,c(1:10,337)])
obj