regr_ind {CARRoT}R Documentation

Indices of the best regressions

Description

One of the two main functions of the package. Identifies the predictors included into regressions with the highest average predictive power

Usage

regr_ind(
  vari,
  outi,
  crv,
  cutoff = NULL,
  part = 10,
  mode,
  cmode = "det",
  predm = "exact",
  objfun = "acc",
  parallel = FALSE,
  cores,
  minx = 1,
  maxx = NULL,
  nr = NULL,
  maxw = NULL,
  st = NULL,
  rule = 10,
  corr = 1,
  Rsq = F,
  marg = 0
)

Arguments

vari

set of predictors

outi

array of outcomes

crv

number of cross-validations

cutoff

cut-off value for mode 'binary'

part

for each cross-validation partitions the dataset into training and test set in a proportion (part-1):part

mode

'binary' (logistic regression), 'multin' (multinomial regression)

cmode

'det' or ''; 'det' always predicts the more likely outcome as determined by the odds ratio; '' predicts certain outcome with probability corresponding to its odds ratio (more conservative). Option available for multinomial/logistic regression

predm

'exact' or ''; for logistic and multinomial regression; 'exact' computes how many times the exact outcome category was predicted, '' computes how many times either the exact outcome category or its nearest neighbour was predicted

objfun

'roc' for maximising the predictive power with respect to AUC, available only for mode='binary'; 'acc' for maximising predictive power with respect to accuracy.

parallel

TRUE if using parallel toolbox, FALSE if not. Defaults to FALSE

cores

number of cores to use in case of parallel=TRUE

minx

minimum number of predictors to be included in a regression, defaults to 1

maxx

maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule

nr

a subset of the data-set, such that 1/part of it lies in the test set and 1-1/part is in the training set, defaults to empty set. This is to ensure that elements of this subset are included both in the training and in the test set.

maxw

maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule

st

a subset of predictors to be always included into a predictive model,defaults to empty set

rule

an Events per Variable (EPV) rule, defaults to 10'

corr

maximum correlation between a pair of predictors in a model

Rsq

whether the R-squared statistics constraint is introduced

marg

margin of error for R-squared statistics constraint

Value

Prints the best predictive power provided by a regression, predictive accuracy of the empirical prediction (value of emp computed by cross_val for logistic and linear regression). Returns indices of the predictors included into regressions with the highest predictive power written in a list. For mode='linear' outputs a list of two lists. First list corresponds to the smallest absolute error, second corresponds to the smallest relative error

See Also

Uses compute_weights, make_numeric, compute_max_weight, compute_weights, compute_max_length, cross_val,av_out, get_indices

Examples

#creating variables for linear regression mode

variables_lin<-matrix(c(rnorm(56,0,1),rnorm(56,1,2)),ncol=2)

#creating outcomes for linear regression mode

outcomes_lin<-rnorm(56,2,1)

#running the function

regr_ind(variables_lin,outcomes_lin,100,mode='linear',parallel=TRUE,cores=2)

#creating variables for binary mode

vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)

#creating outcomes for binary mode

out<-rbinom(100,1,0.3)

#running the function

regr_ind(vari,out,20,cutoff=0.5,part=10,mode='binary',parallel=TRUE,cores=2,nr=c(1,10,20),maxx=1)

[Package CARRoT version 3.0.2 Index]