| cross_val {CARRoT} | R Documentation | 
Cross-validation run
Description
Function running a single cross-validation by partitioning the data into training and test set
Usage
cross_val(
  vari,
  outi,
  c,
  rule,
  part,
  l,
  we,
  vari_col,
  preds,
  mode,
  cmode,
  predm,
  cutoff,
  objfun,
  minx = 1,
  maxx = NULL,
  nr = NULL,
  maxw = NULL,
  st = NULL,
  corr = 1,
  Rsq = F,
  marg = 0,
  n_tr,
  preds_tr
)
Arguments
vari | 
 set of predictors  | 
outi | 
 array of outcomes  | 
c | 
 set of all indices of the predictors  | 
rule | 
 an Events per Variable (EPV) rule, defaults to 10  | 
part | 
 indicates partition of the original data-set into training and test set in a proportion   | 
l | 
 number of observations  | 
we | 
 weights of the predictors  | 
vari_col | 
 overall number of predictors  | 
preds | 
 array to write predictions for the test split into, intially empty  | 
mode | 
 
  | 
cmode | 
 
  | 
predm | 
 
  | 
cutoff | 
 cut-off value for logistic regression  | 
objfun | 
 
  | 
minx | 
 minimum number of predictors to be included in a regression, defaults to 1  | 
maxx | 
 maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule  | 
nr | 
 a subset of the data-set, such that   | 
maxw | 
 maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule  | 
st | 
 a subset of predictors to be always included into a predictive model,defaults to empty set  | 
corr | 
 maximum correlation between a pair of predictors in a model  | 
Rsq | 
 whether R-squared statistics constrained is introduced  | 
marg | 
 margin of error for R-squared statistics constraint  | 
n_tr | 
 size of the training set  | 
preds_tr | 
 array to write predictions for the training split into, intially empty  | 
Value
regr | 
 An M x N matrix of sums of the absolute errors for each element of the test set for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.  | 
regrr | 
 An M x N matrix of sums of the relative errors for each element of the test set (only for   | 
nvar | 
 Maximum feasible number of variables in the regression  | 
emp | 
 An accuracy of always predicting the more likely outcome as suggested by the training set (only for   | 
In regr and regrr NA values are possible since for some numbers of variables there are fewer feasible regressions than for the others.
See Also
Uses compute_max_weight, sum_weights_sub, make_numeric_sets, get_predictions_lin, get_predictions, get_probabilities, AUC, combn
Examples
#creating variables
vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)
#creating outcomes
out<-rbinom(100,1,0.3)
#creating array for predictions
pr<-array(NA,c(2,2))
pr_tr<-array(NA,c(2,2))
#passing set of the inexes of the predictors
c<-c(1:2)
#passing the weights of the predictors
we<-c(1,1)
#setting the mode
m<-'binary'
#running the function
cross_val(vari,out,c,10,10,100,we,2,pr,m,'det','exact',0.5,'acc',nr=c(1,4),n_tr=90,preds_tr=pr_tr)