R: Compare performance of different model fitting/filtering...

benchmarking {FRESA.CAD}

R Documentation

Compare performance of different model fitting/filtering algorithms

Description

Evaluates a data set with a set of fitting/filtering methods and returns the observed cross-validation performance

Usage


BinaryBenchmark(theData = NULL, theOutcome = "Class", reps = 100, trainFraction = 0.5,
                referenceCV = NULL,referenceName = "Reference"
                ,referenceFilterName="Reference")
RegresionBenchmark(theData = NULL, theOutcome = "Class", reps = 100, trainFraction = 0.5,
                referenceCV = NULL,referenceName = "Reference"
                ,referenceFilterName="Reference")
OrdinalBenchmark(theData = NULL, theOutcome = "Class", reps = 100, trainFraction = 0.5,
                referenceCV = NULL,referenceName = "Reference"
                ,referenceFilterName="Reference")

CoxBenchmark(theData = NULL, theOutcome = "Class", reps = 100, trainFraction = 0.5,
                referenceCV = NULL,referenceName = "Reference"
                ,referenceFilterName="COX.BSWiMS")

Arguments

`theData`	The data frame
`theOutcome`	The outcome feature
`reps`	The number of times that the random cross-validation will be performed
`trainFraction`	The fraction of the data used for training.
`referenceCV`	A single random cross-validation object to be benchmarked or a list of CVObjects to be compared
`referenceName`	The name of the reference classifier to be used in the reporting tables
`referenceFilterName`	The name of the reference filter to be used in the reporting tables

Details

The benchmark functions provide the performance of different classification algorithms (BinaryBenchmark), registration algorithms (RegresionBenchmark) or ordinal regression algorithms (OrdinalBenchmark) The evaluation method is based on applying the random cross-validation method (randomCV) that randomly splits the data into train and test sets. The user can provide a Cross validated object that will define the train-test partitions.

The BinaryBenchmark compares: BSWiMS,Random Forest ,RPART,LASSO,SVM/mRMR,KNN and the ensemble of them in their ability to correctly classify the test data. Furthermore, it evaluates the ability of the following feature selection algorithms: BSWiMS or ReferenceCV, LASSO, RPART, RF/BSWiMS, IDI, NRI, t-test, Wilcoxon, Kendall, and mRMR in their ability to select the best set of features for the following classification methods: SVM, KNN, Naive Bayes, Random Forest Nearest Centroid (NC) with root sum square (RSS) , and NC with Spearman correlation

The RegresionBenchmark compares: BSWiMS,Random Forest ,RPART,LASSO,SVM/mRMR and the ensemble of them in their ability to correctly predict the test data. Furthermore, it evaluates the ability of the following feature selection algorithms: BSWiMS or referenceCV, LASSO, RPART, RF/BSWiMS, F-Test, W-Test, Pearson Kendall, and mRMR in their ability to select the best set of features for the following regression methods: Linear Regression, Robust Regression, Ridge Regression, LASSO, SVM, and Random Forest.

The OrdinalBenchmark compares: BSWiMS,Random Forest ,RPART,LASSO,KNN ,SVM and the ensemble of them in their ability to correctly predict the test data. Furthermore, it evaluates the ability of the following feature selection algorithms: BSWiMS or referenceCV, LASSO, RPART, RF/BSWiMS, F-Test, Kendall, and mRMR in their ability to select the best set of features for the following regression methods: Ordinal, KNN, SVM, Random Forest, and Naive Bayes.

The CoxBenchmark compares: BSWiMS, LASSO, BeSS and Univariate Cox analysis in their ability to correctly predict the risk of event happening. It uses cox regression with the four alternatives, but BSWiMS, LASSO are also compared as Wrapper methods.

Value

`errorciTable`	the matrix of the balanced error with the 95 CI
`accciTable`	the matrix of the classification accuracy with the 95 CI
`aucTable`	the matrix of the ROC AUC with the 95 CI
`senTable`	the matrix of the sensitivity with the 95 CI
`speTable`	the matrix of the specificity with the 95 CI
`errorciTable_filter`	the matrix of the balanced error with the 95 CI for filter methods
`accciTable_filter`	the matrix of the classification accuracy with the 95 CI for filter methods
`senciTable_filter`	the matrix of the classification sensitivity with the 95 CI for filter methods
`speciTable_filter`	the matrix of the classification specificity with the 95 CI for filter methods
`aucTable_filter`	the matrix of the ROC AUC with the 95 CI for filter methods
`CorTable`	the matrix of the Pearson correlation with the 95 CI
`RMSETable`	the matrix of the root mean square error (RMSE) with the 95 CI
`BiasTable`	the matrix of the prediction bias with the 95 CI
`CorTable_filter`	the matrix of the Pearson correlation with the 95 CI for filter methods
`RMSETable_filter`	the matrix of the root mean square error (RMSE) with the 95 CI for filter methods
`BiasTable_filter`	the matrix of the prediction bias with the 95 CI for filter methods
`BMAETable`	the matrix of the balanced mean absolute error (MEA) with the 95 CI for filter methods
`KappaTable`	the matrix of the Kappa value with the 95 CI
`BiasTable`	the matrix of the prediction Bias with the 95 CI
`KendallTable`	the matrix of the Kendall correlation with the 95 CI
`MAETable_filter`	the matrix of the mean absolute error (MEA) with the 95 CI for filter methods
`KappaTable_filter`	the matrix of the Kappa value with the 95 CI for filter methods
`BiasTable_filter`	the matrix of the prediction Bias with the 95 CI for filter methods
`KendallTable_filter`	the matrix of the Kendall correlation with the 95 CI for filter methods
`CIRiskTable`	the matrix of the concordance index on Risk with the 95 CI
`LogRankTable`	the matrix of the LogRank Test with the 95 CI
`CIRisksTable_filter`	the matrix of the concordance index on Risk with the 95 CI for the filter methods
`LogRankTable_filter`	the matrix of the LogRank Test with the 95 CI for the filter methods
`times`	The average CPU time used by the method
`jaccard_filter`	The average Jaccard Index of the feature selection methods
`TheCVEvaluations`	The output of the randomCV (`randomCV`) evaluations of the different methods
`testPredictions`	A matrix with all the test predictions
`featureSelectionFrequency`	The frequency of feature selection
`cpuElapsedTimes`	The mean elapsed times

cpuElapsedTimes

Author(s)

Jose G. Tamez-Pena

Examples

	## Not run: 

		### Binary Classification Example ####
		# Start the graphics device driver to save all plots in a pdf format
		pdf(file = "BinaryClassificationExample.pdf",width = 8, height = 6)
		# Get the stage C prostate cancer data from the rpart package

		data(stagec,package = "rpart")

		# Prepare the data. Create a model matrix without the event time
		stagec$pgtime <- NULL
		stagec$eet <- as.factor(stagec$eet)
		options(na.action = 'na.pass')
		stagec_mat <- cbind(pgstat = stagec$pgstat,
		as.data.frame(model.matrix(pgstat ~ .,stagec))[-1])

		# Impute the missing data
        dataCancerImputed <- nearestNeighborImpute(stagec_mat)
        dataCancerImputed[,1:ncol(dataCancerImputed)] <- sapply(dataCancerImputed,as.numeric)	

		# Cross validating a LDA classifier.
		# 80
		cv <- randomCV(dataCancerImputed,"pgstat",MASS::lda,trainFraction = 0.8, 
		repetitions = 10,featureSelectionFunction = univariate_tstudent,
		featureSelection.control = list(limit = 0.5,thr = 0.975));

		# Compare the LDA classifier with other methods
		cp <- BinaryBenchmark(referenceCV = cv,referenceName = "LDA",
		                      referenceFilterName="t.Student")
		pl <- plot(cp,prefix = "StageC: ")

		# Default Benchmark classifiers method (BSWiMS) and filter methods. 
		# 80
		cp <- BinaryBenchmark(theData = dataCancerImputed,
		theOutcome = "pgstat", reps = 10, fraction = 0.8)

		# plot the Cross Validation Metrics
		pl <- plot(cp,prefix = "Stagec:");

		# Shut down the graphics device driver
		dev.off()

		#### Regression Example ######
		# Start the graphics device driver to save all plots in a pdf format
		pdf(file = "RegressionExample.pdf",width=8, height=6)

		# Get the body fat data from the TH package

		data("bodyfat", package = "TH.data")

		# Benchmark regression methods and filter methods. 
		#80
		cp <- RegresionBenchmark(theData = bodyfat, 
		theOutcome = "DEXfat", reps = 10, fraction = 0.8)

		# plot the Cross Validation Metrics
		pl <- plot(cp,prefix = "Body Fat:");
		# Shut down the graphics device driver
		dev.off()

		#### Ordinal Regression Example #####
		# Start the graphics device driver to save all plots in a pdf format
		pdf(file = "OrdinalRegressionExample.pdf",width=8, height=6)


		# Get the GBSG2 data
		data("GBSG2", package = "TH.data")

		# Prepare the model frame for benchmarking
		GBSG2$time <- NULL;
		GBSG2$cens <- NULL;
		GBSG2_mat <- cbind(tgrade = as.numeric(GBSG2$tgrade),
		as.data.frame(model.matrix(tgrade~.,GBSG2))[-1])

		# Benchmark regression methods and filter methods. 
		#30
		cp <- OrdinalBenchmark(theData = GBSG2_mat, 
		theOutcome = "tgrade", reps = 10, fraction = 0.3)

		# plot the Cross Validation Metrics
		pl <- plot(cp,prefix = "GBSG:");

		# Shut down the graphics device driver
		dev.off()

	
## End(Not run)

[Package FRESA.CAD version 3.4.8 Index]