crossValidationFeatureSelection_Res {FRESA.CAD} | R Documentation |
NeRI-based selection of a linear, logistic, or Cox proportional hazards regression model from a set of candidate variables
Description
This function performs a cross-validation analysis of a feature selection algorithm based on net residual improvement (NeRI) to return a predictive model. It is composed of a NeRI-based feature selection followed by an update procedure, ending with a bootstrapping backwards feature elimination. The user can control how many train and blind test sets will be evaluated.
Usage
crossValidationFeatureSelection_Res(size = 10,
fraction = 1.0,
pvalue = 0.05,
loops = 100,
covariates = "1",
Outcome,
timeOutcome = "Time",
variableList,
data,
maxTrainModelSize = 20,
type = c("LM", "LOGIT", "COX"),
testType = c("Binomial",
"Wilcox",
"tStudent",
"Ftest"),
startOffset = 0,
elimination.bootstrap.steps = 100,
trainFraction = 0.67,
trainRepetition = 9,
setIntersect = 1,
unirank = NULL,
print=TRUE,
plots=TRUE,
lambda="lambda.1se",
equivalent=FALSE,
bswimsCycles=10,
usrFitFun=NULL,
featureSize=0)
Arguments
size |
The number of candidate variables to be tested (the first |
fraction |
The fraction of data (sampled with replacement) to be used as train |
pvalue |
The maximum p-value, associated to the NeRI, allowed for a term in the model |
loops |
The number of bootstrap loops |
covariates |
A string of the type "1 + var1 + var2" that defines which variables will always be included in the models (as covariates) |
Outcome |
The name of the column in |
timeOutcome |
The name of the column in |
variableList |
A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables |
data |
A data frame where all variables are stored in different columns |
maxTrainModelSize |
Maximum number of terms that can be included in the model |
type |
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX") |
testType |
Type of non-parametric test to be evaluated by the |
startOffset |
Only terms whose position in the model is larger than the |
elimination.bootstrap.steps |
The number of bootstrap loops for the backwards elimination procedure |
trainFraction |
The fraction of data (sampled with replacement) to be used as train for the cross-validation procedure |
setIntersect |
The intersect of the model (To force a zero intersect, set this value to 0) |
trainRepetition |
The number of cross-validation folds (it should be at least equal to |
unirank |
A list with the results yielded by the |
print |
Logical. If |
plots |
Logical. If |
lambda |
The passed value to the s parameter of the glmnet cross validation coefficient |
equivalent |
Is set to TRUE CV will compute the equivalent model |
bswimsCycles |
The maximum number of models to be returned by |
usrFitFun |
A user fitting function to be evaluated by the cross validation procedure |
featureSize |
The original number of features to be explored in the data frame. |
Details
This function produces a set of data and plots that can be used to inspect the degree of over-fitting or shrinkage of a model. It uses bootstrapped data, cross-validation data, and, if possible, retrain data.
Value
formula.list |
A list containing objects of class |
Models.testPrediction |
A data frame with the blind test set predictions made at each fold of the cross validation (Full B:SWiMS,Median,Bagged,Forward,Backward Elimination), where the models used to generate such predictions ( |
FullBSWiMS.testPrediction |
A data frame similar to |
BSWiMS |
A list containing the values returned by |
forwardSelection |
A list containing the values returned by |
updatedforwardModel |
A list containing the values returned by |
testRMSE |
The global blind test root-mean-square error (RMSE) of the cross-validation procedure |
testPearson |
The global blind test Pearson r product-moment correlation coefficient of the cross-validation procedure |
testSpearman |
The global blind test Spearman |
FulltestRMSE |
The global blind test RMSE of the Full model |
FullTestPearson |
The global blind test Pearson r product-moment correlation coefficient of the Full model |
FullTestSpearman |
The global blind test Spearman |
trainRMSE |
The train RMSE at each fold of the cross-validation procedure |
trainPearson |
The train Pearson r product-moment correlation coefficient at each fold of the cross-validation procedure |
trainSpearman |
The train Spearman |
FullTrainRMSE |
The train RMSE of the Full model at each fold of the cross-validation procedure |
FullTrainPearson |
The train Pearson r product-moment correlation coefficient of the Full model at each fold of the cross-validation procedure |
FullTrainSpearman |
The train Spearman |
testRMSEAtFold |
The blind test RMSE at each fold of the cross-validation procedure |
FullTestRMSEAtFold |
The blind test RMSE of the Full model at each fold of the cross-validation procedure |
Fullenet |
An object of class |
LASSO.testPredictions |
A data frame similar to |
LASSOVariables |
A list with the elastic net Full model and the models found at each cross-validation fold |
byFoldTestMS |
A vector with the Mean Square error for each blind fold |
byFoldTestSpearman |
A vector with the Spearman correlation between prediction and outcome for each blind fold |
byFoldTestPearson |
A vector with the Pearson correlation between prediction and outcome for each blind fold |
byFoldCstat |
A vector with the C-index (Somers' Dxy rank correlation : |
CVBlindPearson |
A vector with the Pearson correlation between the outcome and prediction for each repeated experiment |
CVBlindSpearman |
A vector with the Spearm correlation between the outcome and prediction for each repeated experiment |
CVBlindRMS |
A vector with the RMS between the outcome and prediction for each repeated experiment |
Models.trainPrediction |
A data frame with the outcome and the train prediction of every model |
FullBSWiMS.trainPrediction |
A data frame with the outcome and the train prediction at each CV fold for the main model |
LASSO.trainPredictions |
A data frame with the outcome and the prediction of each enet lasso model |
uniTrainMSS |
A data frame with mean square of the train residuals from the univariate models of the model terms |
uniTestMSS |
A data frame with mean square of the test residuals of the univariate models of the model terms |
BSWiMS.ensemble.prediction |
The ensemble prediction by all models on the test data |
AtOptFormulas.list |
The list of formulas with "optimal" performance |
ForwardFormulas.list |
The list of formulas produced by the forward procedure |
baggFormulas.list |
The list of the bagged models |
LassoFilterVarList |
The list of variables used by LASSO fitting |
Author(s)
Jose G. Tamez-Pena and Antonio Martinez-Torteya
See Also
crossValidationFeatureSelection_Bin,
improvedResiduals,
bootstrapVarElimination_Res