R: Cross-validation run

cross_val {CARRoT}

R Documentation

Cross-validation run

Description

Function running a single cross-validation by partitioning the data into training and test set

Usage

cross_val(
  vari,
  outi,
  c,
  rule,
  part,
  l,
  we,
  vari_col,
  preds,
  mode,
  cmode,
  predm,
  cutoff,
  objfun,
  minx = 1,
  maxx = NULL,
  nr = NULL,
  maxw = NULL,
  st = NULL,
  corr = 1,
  Rsq = F,
  marg = 0,
  n_tr,
  preds_tr
)

Arguments

`vari`	set of predictors
`outi`	array of outcomes
`c`	set of all indices of the predictors
`rule`	an Events per Variable (EPV) rule, defaults to 10
`part`	indicates partition of the original data-set into training and test set in a proportion `(part-1):1`
`l`	number of observations
`we`	weights of the predictors
`vari_col`	overall number of predictors
`preds`	array to write predictions for the test split into, intially empty
`mode`	`'binary'` (logistic regression), `'multin'` (multinomial regression)
`cmode`	`'det'` or `''`; `'det'` always predicts the more likely outcome as determined by the odds ratio; `''` predicts certain outcome with probability corresponding to its odds ratio (more conservative). Option available for multinomial/logistic regression
`predm`	`'exact'` or `''`; for logistic and multinomial regression; `'exact'` computes how many times the exact outcome category was predicted, `''` computes how many times either the exact outcome category or its nearest neighbour was predicted
`cutoff`	cut-off value for logistic regression
`objfun`	`'roc'` for maximising the predictive power with respect to AUC, `'acc'` for maximising predictive power with respect to accuracy.
`minx`	minimum number of predictors to be included in a regression, defaults to 1
`maxx`	maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule
`nr`	a subset of the data-set, such that `1/part` of it lies in the test set and `1-1/part` is in the training set, defaults to empty set
`maxw`	maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule
`st`	a subset of predictors to be always included into a predictive model,defaults to empty set
`corr`	maximum correlation between a pair of predictors in a model
`Rsq`	whether R-squared statistics constrained is introduced
`marg`	margin of error for R-squared statistics constraint
`n_tr`	size of the training set
`preds_tr`	array to write predictions for the training split into, intially empty

Value

`regr`	An M x N matrix of sums of the absolute errors for each element of the test set for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.
`regrr`	An M x N matrix of sums of the relative errors for each element of the test set (only for `mode = 'linear'`) for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.
`nvar`	Maximum feasible number of variables in the regression
`emp`	An accuracy of always predicting the more likely outcome as suggested by the training set (only for `mode = 'binary'` and `objfun = 'acc'`)

In regr and regrr NA values are possible since for some numbers of variables there are fewer feasible regressions than for the others.

Examples

#creating variables

vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)

#creating outcomes

out<-rbinom(100,1,0.3)

#creating array for predictions

pr<-array(NA,c(2,2))

pr_tr<-array(NA,c(2,2))

#passing set of the inexes of the predictors

c<-c(1:2)

#passing the weights of the predictors

we<-c(1,1)

#setting the mode

m<-'binary'

#running the function

cross_val(vari,out,c,10,10,100,we,2,pr,m,'det','exact',0.5,'acc',nr=c(1,4),n_tr=90,preds_tr=pr_tr)