Rase {RaSEn} | R Documentation |
Construct the random subspace ensemble classifier.
Description
RaSE
is a general ensemble classification framework to solve the sparse classification problem. In RaSE algorithm, for each of the B1 weak learners, B2 random subspaces are generated and the optimal one is chosen to train the model on the basis of some criterion.
Usage
Rase(
xtrain,
ytrain,
xval = NULL,
yval = NULL,
B1 = 200,
B2 = 500,
D = NULL,
dist = NULL,
base = NULL,
super = list(type = c("separate"), base.update = TRUE),
criterion = NULL,
ranking = TRUE,
k = c(3, 5, 7, 9, 11),
cores = 1,
seed = NULL,
iteration = 0,
cutoff = TRUE,
cv = 5,
scale = FALSE,
C0 = 0.1,
kl.k = NULL,
lower.limits = NULL,
upper.limits = NULL,
weights = NULL,
...
)
Arguments
xtrain |
n * p observation matrix. n observations, p features. |
ytrain |
n 0/1 observatons. |
xval |
observation matrix for validation. Default = |
yval |
0/1 observation for validation. Default = |
B1 |
the number of weak learners. Default = 200. |
B2 |
the number of subspace candidates generated for each weak learner. Default = 500. |
D |
the maximal subspace size when generating random subspaces. Default = |
dist |
the distribution for features when generating random subspaces. Default = |
base |
the type of base classifier. Default = 'lda'. Can be either a single string chosen from the following options or a string/probability vector. When it indicates a single type of base classifiers, the classical RaSE model (Tian, Y. and Feng, Y., 2021(b)) will be fitted. When it is a string vector which includes multiple base classifier types, a super RaSE model (Zhu, J. and Feng, Y., 2021) will be fitted, by samling base classifiers with equal probabilty. It can also be a probability vector with row names corresponding to the specific classifier type, in which case a super RaSE model will be trained by sampling base classifiers in the given sampling probability.
|
super |
a list of control parameters for super RaSE (Zhu, J. and Feng, Y., 2021). Not used when base equals to a single string. Should be a list object with the following components:
|
criterion |
the criterion to choose the best subspace for each weak learner. For the classical RaSE (when
|
ranking |
whether the function outputs the selected percentage of each feature in B1 subspaces. Logistic, default = TRUE. |
k |
the number of nearest neightbors considered when |
cores |
the number of cores used for parallel computing. Default = 1. |
seed |
the random seed assigned at the start of the algorithm, which can be a real number or |
iteration |
the number of iterations. Default = 0. |
cutoff |
whether to use the empirically optimal threshold. Logistic, default = TRUE. If it is FALSE, the threshold will be set as 0.5. |
cv |
the number of cross-validations used. Default = 5. Only useful when |
scale |
whether to normalize the data. Logistic, default = FALSE. |
C0 |
a positive constant used when |
kl.k |
the number of nearest neighbors used to estimate RIC in a non-parametric way. Default = |
lower.limits |
the vector of lower limits for each coefficient in logistic regression. Should be a vector of length equal to the number of variables (the column number of |
upper.limits |
the vector of upper limits for each coefficient in logistic regression. Should be a vector of length equal to the number of variables (the column number of |
weights |
observation weights. Should be a vector of length equal to training sample size (the length of |
... |
additional arguments. |
Value
An object with S3 class 'RaSE'
if base
indicates a single base classifier.
marginal |
the marginal probability for each class. |
base |
the type of base classifier. |
criterion |
the criterion to choose the best subspace for each weak learner. |
B1 |
the number of weak learners. |
B2 |
the number of subspace candidates generated for each weak learner. |
D |
the maximal subspace size when generating random subspaces. |
iteration |
the number of iterations. |
fit.list |
sequence of B1 fitted base classifiers. |
cutoff |
the empirically optimal threshold. |
subspace |
sequence of subspaces correponding to B1 weak learners. |
ranking |
the selected percentage of each feature in B1 subspaces. |
scale |
a list of scaling parameters, including the scaling center and the scale parameter for each feature. Equals to |
An object with S3 class 'super_RaSE'
if base
includes multiple base classifiers or the sampling probability of multiple classifiers.
marginal |
the marginal probability for each class. |
base |
the list of B1 base classifier types. |
criterion |
the criterion to choose the best subspace for each weak learner. |
B1 |
the number of weak learners. |
B2 |
the number of subspace candidates generated for each weak learner. |
D |
the maximal subspace size when generating random subspaces. |
iteration |
the number of iterations. |
fit.list |
sequence of B1 fitted base classifiers. |
cutoff |
the empirically optimal threshold. |
subspace |
sequence of subspaces correponding to B1 weak learners. |
ranking.feature |
the selected percentage of each feature corresponding to each type of classifier. |
ranking.base |
the selected percentage of each classifier type in the selected B1 learners. |
scale |
a list of scaling parameters, including the scaling center and the scale parameter for each feature. Equals to |
Author(s)
Ye Tian (maintainer, ye.t@columbia.edu) and Yang Feng. The authors thank Yu Cao (Exeter Finance) and his team for many helpful suggestions and discussions.
References
Tian, Y. and Feng, Y., 2021(a). RaSE: A variable screening framework via random subspace ensembles. Journal of the American Statistical Association, (just-accepted), pp.1-30.
Tian, Y. and Feng, Y., 2021(b). RaSE: Random subspace ensemble classification. Journal of Machine Learning Research, 22(45), pp.1-93.
Zhu, J. and Feng, Y., 2021. Super RaSE: Super Random Subspace Ensemble Classification. https://www.preprints.org/manuscript/202110.0042
Chen, J. and Chen, Z., 2008. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), pp.759-771.
Chen, J. and Chen, Z., 2012. Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, pp.555-574.
Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, 1973 (pp. 267-281). Akademiai Kaido.
Schwarz, G., 1978. Estimating the dimension of a model. The annals of statistics, 6(2), pp.461-464.
See Also
predict.RaSE
, RaModel
, print.RaSE
, print.super_RaSE
, RaPlot
, RaScreen
.
Examples
set.seed(0, kind = "L'Ecuyer-CMRG")
train.data <- RaModel("classification", 1, n = 100, p = 50)
test.data <- RaModel("classification", 1, n = 100, p = 50)
xtrain <- train.data$x
ytrain <- train.data$y
xtest <- test.data$x
ytest <- test.data$y
# test RaSE classifier with LDA base classifier
fit <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'lda',
cores = 2, criterion = 'ric')
mean(predict(fit, xtest) != ytest)
## Not run:
# test RaSE classifier with LDA base classifier and 1 iteration round
fit <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 1, base = 'lda',
cores = 2, criterion = 'ric')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with QDA base classifier and 1 iteration round
fit <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 1, base = 'qda',
cores = 2, criterion = 'ric')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with kNN base classifier
fit <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'knn',
cores = 2, criterion = 'loo')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with logistic regression base classifier
fit <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'logistic',
cores = 2, criterion = 'bic')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with SVM base classifier
fit <- Rase(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, base = 'svm',
cores = 2, criterion = 'training')
mean(predict(fit, xtest) != ytest)
# test RaSE classifier with random forest base classifier
fit <- Rase(xtrain, ytrain, B1 = 20, B2 = 10, iteration = 0, base = 'randomforest',
cores = 2, criterion = 'cv', cv = 3)
mean(predict(fit, xtest) != ytest)
# fit a super RaSE classifier by sampling base learner from kNN, LDA and logistic
# regression in equal probability
fit <- Rase(xtrain = xtrain, ytrain = ytrain, B1 = 100, B2 = 100,
base = c("knn", "lda", "logistic"), super = list(type = "separate", base.update = T),
criterion = "cv", cv = 5, iteration = 1, cores = 2)
mean(predict(fit, xtest) != ytest)
# fit a super RaSE classifier by sampling base learner from random forest, LDA and
# SVM with probability 0.2, 0.5 and 0.3
fit <- Rase(xtrain = xtrain, ytrain = ytrain, B1 = 100, B2 = 100,
base = c(randomforest = 0.2, lda = 0.5, svm = 0.3),
super = list(type = "separate", base.update = F),
criterion = "cv", cv = 5, iteration = 0, cores = 2)
mean(predict(fit, xtest) != ytest)
## End(Not run)