efs_eval {EFS}R Documentation

Evaluation of Ensemble Features Selection

Description

Provides several evaluation tests of the ouput of ensemble_fs. There are performance test, namely the logreg test and permutation test as well as tests of stability via the variance of feature importances and the Jaccard-index (see Details).

Usage

efs_eval(data, efs_table, file_name, classnumber, NA_threshold, logreg = TRUE,
  rf = TRUE, permutation = TRUE, p_num = 100, variances = TRUE,
  jaccard = TRUE, bs_num = 100, bs_percentage = 0.9)

Arguments

data

an object of class data.frame

efs_table

a table object of class matrix (retrieved from ensemble_fs)

file_name

a character string, name which is used for the two possible PDF files.

classnumber

a number indicating the index of variable for binary classification

NA_threshold

a number in range of [0,1]. Threshold for deletion of features with a greater proportion of NAs than NA_threshold.

logreg

a logical value indicating whether to conduct an evaluation via logistic regression or not

rf

a logical value indicating whether to conduct an evaluation via random forest or not

permutation

a logical value indicating whether to conduct a permutation of the class variable or not

p_num

number of permutations

variances

a logical value indicating whether to calculate the variances of importances retrieved from bootrapping or not

jaccard

a logical value indicating whether to calculate the jaccard-index or not

bs_num

a number of boostrap permutations of the importances

bs_percentage

a number in range of [0,1]. Proportion of randomly selected samples for boostraping

Details

A logistic regression model with leave-one-out cross-validation (LOOCV) of the selected features and of all feature is conducted by logreg = TRUE. Both AUC-values of the ROC curves are compared with roc.test. The ROC curves are illustrated on the PDF file "file_name" + "LG-ROC.pdf".
By rf = TRUE, random forst model will be constructed and evaluated. Parallel to Logreg, the AUC-values of the two ROC curves of all features and a subset of the best ranked feautres are compared with roc.test. The ROC curves are illustrated on the PDF file "file_name" + "RF-ROC.pdf".

The permutation test (permutation = TRUE) compares the AUC outcome of an logistic regression with p_num AUCs from random permutations of the class variable by a t.test.

Variances of the importances after a bootstrapping analysis are calculated by variances = TRUE. Thereby the number and proportion of the bootstrapping can be set by bs_num and bs_percentage. The function also provides a PDF file "file_name" +"_Variances.pdf". Additionally, the Jaccard-index of this bootstrapped importances can be calculated by setting jaccard=TRUE.

Value

An object of class list, with the following components:
"AUC of LR with all parameters",
"AUC of LR with EFS parameter"
"P-value of LR-ROC test", #'
"AUC of RF with all parameters",
"AUC of RF with EFS parameter"
"P-value of RF-ROC test",
"P-value of permutation",
"Variances of feature importances",
"Jaccard-index".

Author(s)

Ursula Neumann

See Also

glm, roc,prediction, boxplot, tail, t.test

Examples

 ## Loading dataset in environment
 data(efsdata)
 ## Generate a ranking based on importance (with default
 ## NA_threshold = 0.7,cor_threshold = 0.2)
 efs<-ensemble_fs(efsdata,5,runs=2)
 ## Conduct AUC test and permutation test
 eval_example <- efs_eval(data = efsdata, efs_table = efs, file_name = 'eval_test', 
                      classnumber = 5, NA_threshold = 0.2,
                      logreg = TRUE,
                      rf = FALSE,
                      permutation = TRUE, p_num = 2, 
                      variances = FALSE, jaccard = FALSE)
## Calculating variances and the Jaccard-index can take several minutes computation time 

[Package EFS version 1.0.3 Index]