cross_val {CARRoT} | R Documentation |
Cross-validation run
Description
Function running a single cross-validation by partitioning the data into training and test set
Usage
cross_val(
vari,
outi,
c,
rule,
part,
l,
we,
vari_col,
preds,
mode,
cmode,
predm,
cutoff,
objfun,
minx = 1,
maxx = NULL,
nr = NULL,
maxw = NULL,
st = NULL,
corr = 1,
Rsq = F,
marg = 0,
n_tr,
preds_tr
)
Arguments
vari |
set of predictors |
outi |
array of outcomes |
c |
set of all indices of the predictors |
rule |
an Events per Variable (EPV) rule, defaults to 10 |
part |
indicates partition of the original data-set into training and test set in a proportion |
l |
number of observations |
we |
weights of the predictors |
vari_col |
overall number of predictors |
preds |
array to write predictions for the test split into, intially empty |
mode |
|
cmode |
|
predm |
|
cutoff |
cut-off value for logistic regression |
objfun |
|
minx |
minimum number of predictors to be included in a regression, defaults to 1 |
maxx |
maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule |
nr |
a subset of the data-set, such that |
maxw |
maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule |
st |
a subset of predictors to be always included into a predictive model,defaults to empty set |
corr |
maximum correlation between a pair of predictors in a model |
Rsq |
whether R-squared statistics constrained is introduced |
marg |
margin of error for R-squared statistics constraint |
n_tr |
size of the training set |
preds_tr |
array to write predictions for the training split into, intially empty |
Value
regr |
An M x N matrix of sums of the absolute errors for each element of the test set for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used. |
regrr |
An M x N matrix of sums of the relative errors for each element of the test set (only for |
nvar |
Maximum feasible number of variables in the regression |
emp |
An accuracy of always predicting the more likely outcome as suggested by the training set (only for |
In regr
and regrr
NA
values are possible since for some numbers of variables there are fewer feasible regressions than for the others.
See Also
Uses compute_max_weight
, sum_weights_sub
, make_numeric_sets
, get_predictions_lin
, get_predictions
, get_probabilities
, AUC
, combn
Examples
#creating variables
vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)
#creating outcomes
out<-rbinom(100,1,0.3)
#creating array for predictions
pr<-array(NA,c(2,2))
pr_tr<-array(NA,c(2,2))
#passing set of the inexes of the predictors
c<-c(1:2)
#passing the weights of the predictors
we<-c(1,1)
#setting the mode
m<-'binary'
#running the function
cross_val(vari,out,c,10,10,100,we,2,pr,m,'det','exact',0.5,'acc',nr=c(1,4),n_tr=90,preds_tr=pr_tr)