frankvali {mt}R Documentation

Estimates Feature Ranking Error Rate with Resampling

Description

Estimates error rate of feature ranking with resampling methods.

Usage

frankvali(dat, ...)
## Default S3 method:
frankvali(dat,cl,cl.method = "svm", fs.method="fs.auc",
          fs.order=NULL, fs.len="power2", pars = valipars(),
          tr.idx=NULL,...)

## S3 method for class 'formula'
frankvali(formula, data = NULL, ..., subset, na.action = na.omit)

fs.cl(dat,cl,fs.order=colnames(dat), fs.len=1:ncol(dat), 
      cl.method = "svm", pars = valipars(), all.fs=FALSE, ...)
        
fs.cl.1(dat,cl,fs.order=colnames(dat), cl.method = "svm", 
        pars = valipars(), agg_f=FALSE,...)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl

A factor specifying the class for each observation if no formula principal argument is given.

cl.method

Classification method to be used. Any classification methods can be employed if they have method predict (except knn) with output of predicted class label or one component with name of class in the returned list, such as randomForest, svm, knn and lda.

fs.method

Feature ranking method to be used. If fs.order is not NULL, it will be overridden.

fs.order

A vector of ordered feature order. In frankvali its default is NULL and then the feature selection will be performed on the training data.

fs.len

Feature length used for validation. For details, see get.fs.len.

pars

A list of resampling scheme method such as Cross-validation, Stratified cross-validation, Leave-one-out cross-validation, Randomised validation (holdout), Bootstrap, .632 bootstrap and .632 plus bootstrap, and control parameters for the calculation of accuracy. See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

all.fs

A logical value indicating whether all features should be used for evaluation.

agg_f

A logical value indicating whether aggregated features should be used for evaluation.

...

Additional parameters to fs.method or cl.method.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Details

These functions validate the selected feature subsets by classification and resampling methods.

It can take any classification model if its argument format is model(formula, data, subset, ...) and their corresponding method predict.model(object, newdata, ...) can either return the only predicted class label or in a list with name as class, such as lda and pcalda.

The resampling method can be one of cv, scv, loocv, boot, 632b and 632pb.

The feature ranking method can take one of fs.rf, fs.auc, fs.welch, fs.anova, fs.bw, fs.snr, fs.kruskal, fs.relief and fs.rfe.

Value

frankvali returns an object of class including the components:

fs.method

Feature ranking method used.

cl.method

Classification method used.

fs.len

Feature lengths used.

fs.rank

Final feature ranking. It is obtained based on fs.list by Borda vote method.

err.all

Error rate for all computation.

err.iter

Error rate for each iteration.

err.avg

Average error rate for all iterations.

sampling

Sampling scheme used.

niter

Number of iterations.

nboot

Number of bootstrap replications if the sampling method is one of boot, 632b and 632pb.

nfold

Fold of cross-validations if the sampling is cv or scv.

nrand

Number of replications if the sampling is random.

fs.list

Feature list of all computation if fs.order is NULL.

fs.cl and fs.cl.1 return a matrix with columns of acc (accuracy), auc(area under ROC curve) and mar(class margin).

Note

fs.cl is the simplified version of frankvali. Both frankvali and fs.cl are used for validation of aggregated features from top to bottom only, but fs.cl.1 can be used for validation of either individual or aggregated features.

Author(s)

Wanchang Lin

See Also

feat.rank.re, frank.err, valipars, boxplot.frankvali, get.fs.len

Examples

data(abr1)
dat <- abr1$pos
x   <- preproc(dat[,110:500], method="log10")  
y   <- factor(abr1$fact$class)        

dat <- dat.sel(x, y, choices=c("1","2"))
x.1 <- dat[[1]]$dat
y.1 <- dat[[1]]$cls

len  <- c(1:20,seq(25,50,5),seq(60,90,10),seq(100,300,50))
pars <- valipars(sampling="boot",niter=2, nreps=4)
res  <- frankvali(x.1,y.1,cl.method = "knn", fs.method="fs.auc",
                  fs.len=len, pars = pars)
res
summary(res)
boxplot(res)  

## Not run: 
## or apply feature selection with re-sampling procedure at first
fs  <- feat.rank.re(x.1,y.1,method="fs.auc",pars = pars)

## then estimate error of feature selection.
res.1  <- frankvali(x.1,y.1,cl.method = "knn", fs.order=fs$fs.order,
                    fs.len=len, pars = pars)
res.1

## use formula
data.bin <- data.frame(y.1,x.1)

pars <- valipars(sampling="cv",niter=2,nreps=4)
res.2  <- frankvali(y.1~., data=data.bin,fs.method="fs.rfe",fs.len=len, 
                    cl.method = "knn",pars = pars)
res.2

## examples of fs.cl and fs.cl.1
fs <- fs.rf(x.1, y.1)
res.3 <- fs.cl(x.1,y.1,fs.order=fs$fs.order, fs.len=len,
               cl.method = "svm", pars = pars, all.fs=TRUE)

ord <- fs$fs.order[1:50]
## aggregated features
res.4 <- fs.cl.1(x.1,y.1,fs.order=ord, cl.method = "svm", pars = pars,
                 agg_f=TRUE)
               
## individual feature
res.5 <- fs.cl.1(x.1,y.1,fs.order=ord, cl.method = "svm", pars = pars,
                 agg_f=FALSE)
                 

## End(Not run)

[Package mt version 2.0-1.20 Index]