R: Indices of the best regressions

regr_ind {CARRoT}

R Documentation

Indices of the best regressions

Description

One of the two main functions of the package. Identifies the predictors included into regressions with the highest average predictive power

Usage

regr_ind(
  vari,
  outi,
  crv,
  cutoff = NULL,
  part = 10,
  mode,
  cmode = "det",
  predm = "exact",
  objfun = "acc",
  parallel = FALSE,
  cores,
  minx = 1,
  maxx = NULL,
  nr = NULL,
  maxw = NULL,
  st = NULL,
  rule = 10,
  corr = 1,
  Rsq = F,
  marg = 0
)

Arguments

`vari`	set of predictors
`outi`	array of outcomes
`crv`	number of cross-validations
`cutoff`	cut-off value for mode `'binary'`
`part`	for each cross-validation partitions the dataset into training and test set in a proportion `(part-1):part`
`mode`	`'binary'` (logistic regression), `'multin'` (multinomial regression)
`cmode`	`'det'` or `''`; `'det'` always predicts the more likely outcome as determined by the odds ratio; `''` predicts certain outcome with probability corresponding to its odds ratio (more conservative). Option available for multinomial/logistic regression
`predm`	`'exact'` or `''`; for logistic and multinomial regression; `'exact'` computes how many times the exact outcome category was predicted, `''` computes how many times either the exact outcome category or its nearest neighbour was predicted
`objfun`	`'roc'` for maximising the predictive power with respect to AUC, available only for `mode='binary'`; `'acc'` for maximising predictive power with respect to accuracy.
`parallel`	TRUE if using parallel toolbox, FALSE if not. Defaults to FALSE
`cores`	number of cores to use in case of parallel=TRUE
`minx`	minimum number of predictors to be included in a regression, defaults to 1
`maxx`	maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule
`nr`	a subset of the data-set, such that `1/part` of it lies in the test set and `1-1/part` is in the training set, defaults to empty set. This is to ensure that elements of this subset are included both in the training and in the test set.
`maxw`	maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule
`st`	a subset of predictors to be always included into a predictive model,defaults to empty set
`rule`	an Events per Variable (EPV) rule, defaults to 10'
`corr`	maximum correlation between a pair of predictors in a model
`Rsq`	whether the R-squared statistics constraint is introduced
`marg`	margin of error for R-squared statistics constraint

Value

Prints the best predictive power provided by a regression, predictive accuracy of the empirical prediction (value of emp computed by cross_val for logistic and linear regression). Returns indices of the predictors included into regressions with the highest predictive power written in a list. For mode='linear' outputs a list of two lists. First list corresponds to the smallest absolute error, second corresponds to the smallest relative error

Examples

#creating variables for linear regression mode

variables_lin<-matrix(c(rnorm(56,0,1),rnorm(56,1,2)),ncol=2)

#creating outcomes for linear regression mode

outcomes_lin<-rnorm(56,2,1)

#running the function

regr_ind(variables_lin,outcomes_lin,100,mode='linear',parallel=TRUE,cores=2)

#creating variables for binary mode

vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)

#creating outcomes for binary mode

out<-rbinom(100,1,0.3)

#running the function

regr_ind(vari,out,20,cutoff=0.5,part=10,mode='binary',parallel=TRUE,cores=2,nr=c(1,10,20),maxx=1)