efs_eval {EFS} | R Documentation |
Evaluation of Ensemble Features Selection
Description
Provides several evaluation tests of
the ouput of ensemble_fs
. There are
performance test, namely the logreg test and permutation
test as well as tests of stability via the variance
of feature importances and the Jaccard-index (see Details).
Usage
efs_eval(data, efs_table, file_name, classnumber, NA_threshold, logreg = TRUE,
rf = TRUE, permutation = TRUE, p_num = 100, variances = TRUE,
jaccard = TRUE, bs_num = 100, bs_percentage = 0.9)
Arguments
data |
an object of class data.frame |
efs_table |
a table object of class matrix (retrieved
from |
file_name |
a character string, name which is used for the two possible PDF files. |
classnumber |
a number indicating the index of variable for binary classification |
NA_threshold |
a number in range of [0,1]. Threshold for deletion
of features with a greater proportion of NAs than |
logreg |
a logical value indicating whether to conduct an evaluation via logistic regression or not |
rf |
a logical value indicating whether to conduct an evaluation via random forest or not |
permutation |
a logical value indicating whether to conduct a permutation of the class variable or not |
p_num |
number of permutations |
variances |
a logical value indicating whether to calculate the variances of importances retrieved from bootrapping or not |
jaccard |
a logical value indicating whether to calculate the jaccard-index or not |
bs_num |
a number of boostrap permutations of the importances |
bs_percentage |
a number in range of [0,1]. Proportion of randomly selected samples for boostraping |
Details
A logistic regression model with leave-one-out cross-validation (LOOCV) of the
selected features and of all feature is conducted by logreg = TRUE
.
Both AUC-values of the ROC curves are compared with roc.test
.
The ROC curves are illustrated on the PDF file "file_name" + "LG-ROC.pdf".
By rf = TRUE
, random forst model will be constructed and evaluated.
Parallel to Logreg, the AUC-values of the two ROC curves of all features and a subset
of the best ranked feautres are compared with roc.test
.
The ROC curves are illustrated on the PDF file "file_name" + "RF-ROC.pdf".
The permutation test (permutation = TRUE
) compares the AUC outcome of
an logistic regression with p_num
AUCs from random
permutations of the class variable by a t.test
.
Variances of the importances after a bootstrapping analysis are
calculated by variances = TRUE
. Thereby the number and proportion
of the bootstrapping can be set by bs_num
and bs_percentage
.
The function also provides a PDF file "file_name" +"_Variances.pdf".
Additionally, the Jaccard-index of this bootstrapped importances
can be calculated by setting jaccard=TRUE
.
Value
An object of class list, with the following components:
"AUC of LR with all parameters",
"AUC of LR with EFS parameter"
"P-value of LR-ROC test",
#'
"AUC of RF with all parameters",
"AUC of RF with EFS parameter"
"P-value of RF-ROC test",
"P-value of permutation",
"Variances of feature importances",
"Jaccard-index".
Author(s)
Ursula Neumann
See Also
glm, roc,prediction, boxplot, tail, t.test
Examples
## Loading dataset in environment
data(efsdata)
## Generate a ranking based on importance (with default
## NA_threshold = 0.7,cor_threshold = 0.2)
efs<-ensemble_fs(efsdata,5,runs=2)
## Conduct AUC test and permutation test
eval_example <- efs_eval(data = efsdata, efs_table = efs, file_name = 'eval_test',
classnumber = 5, NA_threshold = 0.2,
logreg = TRUE,
rf = FALSE,
permutation = TRUE, p_num = 2,
variances = FALSE, jaccard = FALSE)
## Calculating variances and the Jaccard-index can take several minutes computation time