bagging.pltr {GPLTR}R Documentation

bagging pltr models

Description

bagging procedure to agregate several PLTR models for accurate prediction and variable selection

Usage

bagging.pltr(xdata, Y.name, X.names, G.names, family = "binomial", 
args.rpart,epsi = 0.001, iterMax = 5, iterMin = 3, LB = FALSE, 
args.parallel = list(numWorkers = 1),
Bag = 20, Pred_Data = data.frame(), verbose = TRUE, doprune = FALSE
, thresshold = seq(0, 1, by = 0.1))

Arguments

xdata

the learning data frame

Y.name

the name of the binary dependent variable

X.names

the names of independent variables to consider in the linear part of the glm and as offset in the tree part

G.names

the names of independent variables to consider in the tree part of the hybrid glm.

family

the glm family considered depending on the type of the dependent variable (only the binomial family works in this function for the moment) .

args.rpart

a list of options that control details of the rpart algorithm. minbucket: the minimum number of observations in any terminal <leaf> node; cp: complexity parameter (Any split that does not decrease the overall lack of fit by a factor of cp is not attempted); maxdepth: the maximum depth of any node of the final tree, with the root node counted as depth 0. ... See rpart.control for further details

epsi

a treshold value to check the convergence of the algorithm

iterMax

the maximal number of iteration to consider

iterMin

the minimum number of iteration to consider

LB

a binary indicator with values TRUE or FALSE indicating weither the loading is balanced or not in the parallel computing. It is nevertheless useless on a windows platform. See mclapply

args.parallel

a list of two elements containing the number of workers and the type of parallelization to achieve see mclapply.

Bag

The number of Bagging samples to consider

Pred_Data

An optional data frame to validate the bagging procedure (the test dataset)

verbose

Logical; TRUE for printing progress during the computation (helpful for debugging)

doprune

a binary indicator with values TRUE or FALSE indicating weither the set of trees in the bagging procedure are pruned (by a BIC procedure) or not

thresshold

a vector of numerical values between 0 and 1 used as thresshold values for the computation of the OOB error rate

Details

For the Bagging procedure, it is mendatory to set maxcompete = 0 and maxsurrogate = 0 within the rpart arguments. This will ensured the correct calculation of the importance of variables.

Value

A list with eleven elements

IND_OOB

A list of length Bag containing the Out Of Bag (OOB) individuals for each PLTR model.

EOOB

The vector of OOB errors of the bagging procedure for each thresshold value.

OOB_ERRORS_PBP

A matrix with Bag columns and thresshold rows containing OOB error of each PLTR model in the bagging sequence for each thresshold value.

OOB_ERROR_PBP

A vector containing the mean of OOB_ERRORS_PBP for each thresshold value.

Tree_BAG

A list of length Bag containing the bagging trees

Glm_BAG

A list of length Bag containing the bagging pltr model; could be helpfull for prediction on new features.

LOST

The 0, 1 lost matrix for OOB observations at each thresshold value

TEST

A value of NULL if Pred_Data is not available. A list of three elements otherwise: PRED_ERROR: the estimated error of the Bagging procedure on the test sample for each thresshold value; PRED_IND: A list of length the length of the thresshold vector, each element containing a matrix with the prediction of the testing data individuals using each PLTR model of the bagging sequence (column by column); FINAL_PRED_IND: A list containing the final prediction of each individual of the testing data by the bagging procedure (the modal prediction) for each thresshold value.

Var_IMP

A numeric vector containing the relative variable importance of the bagging procedure

Timediff

The execution time of the bagging procedure

CUT

The thresshold value used inside the bagging procedure

Author(s)

Cyprien Mbogning

References

Mbogning, C., Perdry, H., Broet, P.: A Bagged partially linear tree-based regression procedure for prediction and variable selection. Human Heredity (To appear) (2015)

Leo Breiman: Bagging Predictors. Machine Learning, 24, 123-140 (1996)

See Also

predict_bagg.pltr

Examples

## Not run: 
##load the data set

data(burn)

## set the parameters 

args.rpart <- list(minbucket = 10, maxdepth = 4, cp = 0, maxsurrogate = 0)
family <- "binomial"
Y.name <- "D2"
X.names <- "Z2"
G.names <- c('Z1','Z3','Z4','Z5','Z6','Z7','Z8','Z9','Z10','Z11')
args.parallel = list(numWorkers = 1)
                     
## Bagging a set of basic unprunned pltr predictors

Bag.burn <-  bagging.pltr(burn, Y.name, X.names, G.names, family, 
             args.rpart,epsi = 0.01, iterMax = 4, iterMin = 3, 
             Bag = 20, verbose = FALSE, doprune = FALSE)

## End(Not run)

[Package GPLTR version 1.5 Index]