ecv.regression {EnsembleCV}R Documentation

Cross-Validation-Based Integration of Regression Base Learners for Ensemble Learning

Description

This function uses repeated cross-validation to find the base learner configuration with smallest error. It then trains and returns the chosen model (base learner and configuration), trained on the full data set.

Usage

ecv.regression(formula, data
  , baselearner.control = ecv.regression.baselearner.control()
  , integrator.control = ecv.regression.integrator.control()
  , ncores = 1, filemethod = FALSE, print.level = 1
  , preschedule = TRUE
  , schedule.method = c("random", "as.is", "task.length")
  , task.length
)

Arguments

formula

Formula expressing response variable and covariates.

data

Data frame containing the response variable and covariates.

baselearner.control

Control structure determining the base learners, their configurations, and data partitioning details. See ecv.regression.baselearner.control.

integrator.control

Control structure governing integrator behavior. See ecv.regression.integrator.control.

ncores

Number of cores used for parallel training of base learners.

filemethod

Boolean flag indicating whether or not to save estimation objects to disk or not. Using filemethod=T reduces RAM pressure.

print.level

Controlling verbosity level.

preschedule

Boolean flag, indicating whether base learner training jobs must be scheduled statically (TRUE) or dynamically (FALSE).

schedule.method

Method used for scheduling tasks on threads. In "as.is" tasks are assigned to threads in a round-robin fashion for static scheduling. In dynamic scheduling, tasks form a queue without any re-ordering. In "random", tasks are first randomly shuffled, and the rest is similar to "as.is". In "task.length", a heuristic algorithm is used in static scheduling for assigning tasks to threads to minimize load imbalance, i.e. make total task lengths in threads roughly equal. In dynamic scheduling, tasks are sorted in descending order of expected length to form the task queue.

task.length

Vector of estimated task lengths, to be used in the "task.length" method of scheduling.

Value

An object of classes ecv.regression (if filemethod==TRUE, also has class of ecv.file), a list with the following elements:

call

Copy of function call.

formula

Copy of formula argument in function call.

instance.list

An object of class Instance.List, containing all permutations of base learner configurations and random data partitions generated in the body of the function.

integrator.config

Copy of configuration object passed to the integrator. Object of class Regression.Select.MinAvgErr.Config.

method

Integration method. Currently, only "default" is supported.

est

A list with these elements: 1) baselearner.cv.batch, an object of class Regression.CV.Batch.FitObj containing the fit object from CV batch training of base learners; 2) baselearner.batch, an object of class Regression.Batch.FitObj containing the fit object from batch training of base learners on entire data; 3) integrator, an object of class Regression.Select.MinAvgErr.FitObj containing the fit object returned by the integrator.

y

Copy of response variable vector.

pred

Within-sample prediction of the ensemble model.

filemethod

Copy of passed-in filemethod argument.

Author(s)

Mansour T.A. Sharabiani, Alireza S. Mahani

See Also

ecv.regression.baselearner.control, ecv.regression.integrator.control, Instance.List, Regression.Select.MinAvgErr.Config, Regression.CV.Batch.FitObj, Regression.Batch.FitObj, Regression.Select.MinAvgErr.FitObj

Examples

data(servo)
myformula <- class~motor+screw+pgain+vgain
perc.train <- 0.7
index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo)))
data.train <- servo[index.train,]
data.predict <- servo[-index.train,]
## to run longer test using all 5 default regression base learners
## try: est <- ecv.regression(myformula, data.train, ncores=2)
est <- ecv.regression(myformula, data.train, ncores=2
  , baselearner.control = 
      ecv.regression.baselearner.control(baselearners = c("knn")))
newpred <- predict(est, data.predict)

[Package EnsembleCV version 0.8 Index]