RaScreen {RaSEn} | R Documentation |
Variable screening via RaSE.
Description
RaSE
is a general framework for variable screening. In RaSE screening, to select each of the B1 subspaces, B2 random subspaces are generated and the optimal one is chosen according to some criterion. Then the selected proportions (equivalently, percentages) of variables in the B1 subspaces are used as importance measure to rank these variables.
Usage
RaScreen(
xtrain,
ytrain,
xval = NULL,
yval = NULL,
B1 = 200,
B2 = NULL,
D = NULL,
dist = NULL,
model = NULL,
criterion = NULL,
k = 5,
cores = 1,
seed = NULL,
iteration = 0,
cv = 5,
scale = FALSE,
C0 = 0.1,
kl.k = NULL,
classification = NULL,
...
)
Arguments
xtrain |
n * p observation matrix. n observations, p features. |
ytrain |
n 0/1 observatons. |
xval |
observation matrix for validation. Default = |
yval |
0/1 observation for validation. Default = |
B1 |
the number of weak learners. Default = 200. |
B2 |
the number of subspace candidates generated for each weak learner. Default = |
D |
the maximal subspace size when generating random subspaces. Default = |
dist |
the distribution for features when generating random subspaces. Default = |
model |
the model to use. Default = 'lda' when
|
criterion |
the criterion to choose the best subspace. Default = 'ric' when
|
k |
the number of nearest neightbors considered when |
cores |
the number of cores used for parallel computing. Default = 1. |
seed |
the random seed assigned at the start of the algorithm, which can be a real number or |
iteration |
the number of iterations. Default = 0. |
cv |
the number of cross-validations used. Default = 5. Only useful when |
scale |
whether to normalize the data. Logistic, default = FALSE. |
C0 |
a positive constant used when |
kl.k |
the number of nearest neighbors used to estimate RIC in a non-parametric way. Default = |
classification |
the indicator of the problem type, which can be TRUE, FALSE or |
... |
additional arguments. |
Value
A list including the following items.
model |
the model used in RaSE screening. |
criterion |
the criterion to choose the best subspace for each weak learner. |
B1 |
the number of selected subspaces. |
B2 |
the number of subspace candidates generated for each of B1 subspaces. |
n |
the sample size. |
p |
the dimension of data. |
D |
the maximal subspace size when generating random subspaces. |
iteration |
the number of iterations. |
selected.perc |
A list of length ( |
scale |
a list of scaling parameters, including the scaling center and the scale parameter for each feature. Equals to |
References
Tian, Y. and Feng, Y., 2021(a). RaSE: A variable screening framework via random subspace ensembles. Journal of the American Statistical Association, (just-accepted), pp.1-30.
Tian, Y. and Feng, Y., 2021(b). RaSE: Random subspace ensemble classification. Journal of Machine Learning Research, 22(45), pp.1-93.
Chen, J. and Chen, Z., 2008. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), pp.759-771.
Chen, J. and Chen, Z., 2012. Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, pp.555-574.
Schwarz, G., 1978. Estimating the dimension of a model. The annals of statistics, 6(2), pp.461-464.
See Also
Examples
set.seed(0, kind = "L'Ecuyer-CMRG")
train.data <- RaModel("screening", 1, n = 100, p = 100)
xtrain <- train.data$x
ytrain <- train.data$y
# test RaSE screening with linear regression model and BIC
fit <- RaScreen(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, model = 'lm',
cores = 2, criterion = 'bic')
# Select D variables
RaRank(fit, selected.num = "D")
## Not run:
# test RaSE screening with knn model and 5-fold cross-validation MSE
fit <- RaScreen(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, model = 'knn',
cores = 2, criterion = 'cv', cv = 5)
# Select n/logn variables
RaRank(fit, selected.num = "n/logn")
# test RaSE screening with SVM and 5-fold cross-validation MSE
fit <- RaScreen(xtrain, ytrain, B1 = 100, B2 = 50, iteration = 0, model = 'svm',
cores = 2, criterion = 'cv', cv = 5)
# Select n/logn variables
RaRank(fit, selected.num = "n/logn")
# test RaSE screening with logistic regression model and eBIC (gam = 0.5). Set iteration number = 1
train.data <- RaModel("screening", 6, n = 100, p = 100)
xtrain <- train.data$x
ytrain <- train.data$y
fit <- RaScreen(xtrain, ytrain, B1 = 100, B2 = 100, iteration = 1, model = 'logistic',
cores = 2, criterion = 'ebic', gam = 0.5)
# Select n/logn variables from the selected percentage after one iteration round
RaRank(fit, selected.num = "n/logn", iteration = 1)
## End(Not run)