bagging.pltr {GPLTR} | R Documentation |
bagging pltr models
Description
bagging procedure to agregate several PLTR models for accurate prediction and variable selection
Usage
bagging.pltr(xdata, Y.name, X.names, G.names, family = "binomial",
args.rpart,epsi = 0.001, iterMax = 5, iterMin = 3, LB = FALSE,
args.parallel = list(numWorkers = 1),
Bag = 20, Pred_Data = data.frame(), verbose = TRUE, doprune = FALSE
, thresshold = seq(0, 1, by = 0.1))
Arguments
xdata |
the learning data frame |
Y.name |
the name of the binary dependent variable |
X.names |
the names of independent variables to consider in the linear part of the glm and as offset in the tree part |
G.names |
the names of independent variables to consider in the tree part of the hybrid glm. |
family |
the glm family considered depending on the type of the dependent variable (only the binomial family works in this function for the moment) . |
args.rpart |
a list of options that control details of the rpart algorithm. |
epsi |
a treshold value to check the convergence of the algorithm |
iterMax |
the maximal number of iteration to consider |
iterMin |
the minimum number of iteration to consider |
LB |
a binary indicator with values TRUE or FALSE indicating weither the loading is balanced or not in the parallel computing. It is nevertheless useless on a windows platform. See |
args.parallel |
a list of two elements containing the number of workers and the type of parallelization to achieve see |
Bag |
The number of Bagging samples to consider |
Pred_Data |
An optional data frame to validate the bagging procedure (the test dataset) |
verbose |
Logical; TRUE for printing progress during the computation (helpful for debugging) |
doprune |
a binary indicator with values TRUE or FALSE indicating weither the set of trees in the bagging procedure are pruned (by a |
thresshold |
a vector of numerical values between 0 and 1 used as thresshold values for the computation of the OOB error rate |
Details
For the Bagging procedure, it is mendatory to set maxcompete = 0
and maxsurrogate = 0
within the rpart arguments. This will ensured the correct calculation of the importance of variables.
Value
A list with eleven elements
IND_OOB |
A list of length |
EOOB |
The vector of OOB errors of the bagging procedure for each thresshold value. |
OOB_ERRORS_PBP |
A matrix with |
OOB_ERROR_PBP |
A vector containing the mean of |
Tree_BAG |
A list of length |
Glm_BAG |
A list of length |
LOST |
The 0, 1 lost matrix for OOB observations at each thresshold value |
TEST |
A value of |
Var_IMP |
A numeric vector containing the relative variable importance of the bagging procedure |
Timediff |
The execution time of the bagging procedure |
CUT |
The thresshold value used inside the bagging procedure |
Author(s)
Cyprien Mbogning
References
Mbogning, C., Perdry, H., Broet, P.: A Bagged partially linear tree-based regression procedure for prediction and variable selection. Human Heredity (To appear) (2015)
Leo Breiman: Bagging Predictors. Machine Learning, 24, 123-140 (1996)
See Also
Examples
## Not run:
##load the data set
data(burn)
## set the parameters
args.rpart <- list(minbucket = 10, maxdepth = 4, cp = 0, maxsurrogate = 0)
family <- "binomial"
Y.name <- "D2"
X.names <- "Z2"
G.names <- c('Z1','Z3','Z4','Z5','Z6','Z7','Z8','Z9','Z10','Z11')
args.parallel = list(numWorkers = 1)
## Bagging a set of basic unprunned pltr predictors
Bag.burn <- bagging.pltr(burn, Y.name, X.names, G.names, family,
args.rpart,epsi = 0.01, iterMax = 4, iterMin = 3,
Bag = 20, verbose = FALSE, doprune = FALSE)
## End(Not run)