cross_val {CARRoT}R Documentation

Cross-validation run

Description

Function running a single cross-validation by partitioning the data into training and test set

Usage

cross_val(
  vari,
  outi,
  c,
  rule,
  part,
  l,
  we,
  vari_col,
  preds,
  mode,
  cmode,
  predm,
  cutoff,
  objfun,
  minx = 1,
  maxx = NULL,
  nr = NULL,
  maxw = NULL,
  st = NULL,
  corr = 1,
  Rsq = F,
  marg = 0,
  n_tr,
  preds_tr
)

Arguments

vari

set of predictors

outi

array of outcomes

c

set of all indices of the predictors

rule

an Events per Variable (EPV) rule, defaults to 10

part

indicates partition of the original data-set into training and test set in a proportion (part-1):1

l

number of observations

we

weights of the predictors

vari_col

overall number of predictors

preds

array to write predictions for the test split into, intially empty

mode

'binary' (logistic regression), 'multin' (multinomial regression)

cmode

'det' or ''; 'det' always predicts the more likely outcome as determined by the odds ratio; '' predicts certain outcome with probability corresponding to its odds ratio (more conservative). Option available for multinomial/logistic regression

predm

'exact' or ''; for logistic and multinomial regression; 'exact' computes how many times the exact outcome category was predicted, '' computes how many times either the exact outcome category or its nearest neighbour was predicted

cutoff

cut-off value for logistic regression

objfun

'roc' for maximising the predictive power with respect to AUC, 'acc' for maximising predictive power with respect to accuracy.

minx

minimum number of predictors to be included in a regression, defaults to 1

maxx

maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule

nr

a subset of the data-set, such that 1/part of it lies in the test set and 1-1/part is in the training set, defaults to empty set

maxw

maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule

st

a subset of predictors to be always included into a predictive model,defaults to empty set

corr

maximum correlation between a pair of predictors in a model

Rsq

whether R-squared statistics constrained is introduced

marg

margin of error for R-squared statistics constraint

n_tr

size of the training set

preds_tr

array to write predictions for the training split into, intially empty

Value

regr

An M x N matrix of sums of the absolute errors for each element of the test set for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.

regrr

An M x N matrix of sums of the relative errors for each element of the test set (only for mode = 'linear') for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.

nvar

Maximum feasible number of variables in the regression

emp

An accuracy of always predicting the more likely outcome as suggested by the training set (only for mode = 'binary' and objfun = 'acc')

In regr and regrr NA values are possible since for some numbers of variables there are fewer feasible regressions than for the others.

See Also

Uses compute_max_weight, sum_weights_sub, make_numeric_sets, get_predictions_lin, get_predictions, get_probabilities, AUC, combn

Examples

#creating variables

vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)

#creating outcomes

out<-rbinom(100,1,0.3)

#creating array for predictions

pr<-array(NA,c(2,2))

pr_tr<-array(NA,c(2,2))

#passing set of the inexes of the predictors

c<-c(1:2)

#passing the weights of the predictors

we<-c(1,1)

#setting the mode

m<-'binary'

#running the function

cross_val(vari,out,c,10,10,100,we,2,pr,m,'det','exact',0.5,'acc',nr=c(1,4),n_tr=90,preds_tr=pr_tr)

[Package CARRoT version 3.0.2 Index]