R: A General Framework For Bagging

bag {caret}

R Documentation

A General Framework For Bagging

Description

bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).

Usage

bag(x, ...)

bagControl(
  fit = NULL,
  predict = NULL,
  aggregate = NULL,
  downSample = FALSE,
  oob = TRUE,
  allowParallel = TRUE
)

## Default S3 method:
bag(x, y, B = 10, vars = ncol(x), bagControl = NULL, ...)

## S3 method for class 'bag'
predict(object, newdata = NULL, ...)

## S3 method for class 'bag'
print(x, ...)

## S3 method for class 'bag'
summary(object, ...)

## S3 method for class 'summary.bag'
print(x, digits = max(3, getOption("digits") - 3), ...)

ldaBag

plsBag

nbBag

ctreeBag

svmBag

nnetBag

Arguments

`x`	a matrix or data frame of predictors
`...`	arguments to pass to the model function
`fit`	a function that has arguments `x`, `y` and `...` and produces a model object #' that can later be used for prediction. Example functions are found in `ldaBag`, `plsBag`, #' `nbBag`, `svmBag` and `nnetBag`.
`predict`	a function that generates predictions for each sub-model. The function should have #' arguments `object` and `x`. The output of the function can be any type of object (see the #' example below where posterior probabilities are generated. Example functions are found in `ldaBag`#' , `plsBag`, `nbBag`, `svmBag` and `nnetBag`.)
`aggregate`	a function with arguments `x` and `type`. The function that takes the output #' of the `predict` function and reduces the bagged predictions to a single prediction per sample. #' the `type` argument can be used to switch between predicting classes or class probabilities for #' classification models. Example functions are found in `ldaBag`, `plsBag`, `nbBag`, #' `svmBag` and `nnetBag`.
`downSample`	logical: for classification, should the data set be randomly sampled so that each #' class has the same number of samples as the smallest class?
`oob`	logical: should out-of-bag statistics be computed and the predictions retained?
`allowParallel`	a parallel backend is loaded and available, should the function use it?
`y`	a vector of outcomes
`B`	the number of bootstrap samples to train over.
`vars`	an integer. If this argument is not `NULL`, a random sample of size `vars` is taken of the predictors in each bagging iteration. If `NULL`, all predictors are used.
`bagControl`	a list of options.
`object`	an object of class `bag`.
`newdata`	a matrix or data frame of samples for prediction. Note that this argument must have a non-null value
`digits`	minimal number of significant digits.

Format

An object of class list of length 3.

Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate.

One note: when vars is not NULL, the sub-setting occurs prior to the fit and #' predict functions are called. In this way, the user probably does not need to account for the #' change in predictors in their functions.

When using bag with train, classification models should use type = "prob" #' inside of the predict function so that predict.train(object, newdata, type = "prob") will #' work.

If a parallel backend is registered, the foreach package is used to train the models in parallel.

Value

bag produces an object of class bag with elements

`fits`	a list with two sub-objects: the `fit` object has the actual model fit for that #' bagged samples and the `vars` object is either `NULL` or a vector of integers corresponding to which predictors were sampled for that model
`control`	a mirror of the arguments passed into `bagControl`
`call`	the call
`B`	the number of bagging iterations
`dims`	the dimensions of the training set

Author(s)

Max Kuhn

Examples

## A simple example of bagging conditional inference regression trees:
data(BloodBrain)

## treebag <- bag(bbbDescr, logBBB, B = 10,
##                bagControl = bagControl(fit = ctreeBag$fit,
##                                        predict = ctreeBag$pred,
##                                        aggregate = ctreeBag$aggregate))




## An example of pooling posterior probabilities to generate class predictions
data(mdrr)

## remove some zero variance predictors and linear dependencies
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]

## basicLDA <- train(mdrrDescr, mdrrClass, "lda")

## bagLDA2 <- train(mdrrDescr, mdrrClass,
##                  "bag",
##                  B = 10,
##                  bagControl = bagControl(fit = ldaBag$fit,
##                                          predict = ldaBag$pred,
##                                          aggregate = ldaBag$aggregate),
##                  tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))

[Package caret version 6.0-94 Index]