build.UBaymodel {UBayFS} | R Documentation |
Build an ensemble for UBayFS
Description
Build a data structure for UBayFS and train an ensemble of elementary feature selectors.
Usage
build.UBaymodel(
data,
target,
M = 100,
tt_split = 0.75,
nr_features = "auto",
method = "mRMR",
prior_model = "dirichlet",
weights = 1,
constraints = NULL,
lambda = 1,
optim_method = "GA",
popsize = 50,
maxiter = 100,
shiny = FALSE,
...
)
Arguments
data |
a matrix of input data |
target |
a vector of input labels; for binary problems a factor variable should be used |
M |
the number of elementary models to be trained in the ensemble |
tt_split |
the ratio of samples drawn for building an elementary model (train-test-split) |
nr_features |
number of features to select in each elementary model; if 'auto' a randomized number of features is used in each elementary model |
method |
a vector denoting the method(s) used as elementary models; options: 'mRMR', 'laplace' (Laplacian score) Also self-defined functions are possible methods; they must have the arguments X (data), y (target), n (number of features) and name (name of the function). For more details see examples. |
prior_model |
a string denoting the prior model to use; options: 'dirichlet', 'wong', 'hankin'; 'hankin' is the most general prior model, but also the most time consuming |
weights |
the vector of user-defined prior weights for each feature |
constraints |
a list containing a relaxed system 'Ax<=b' of user constraints, given as matrix 'A', vector 'b' and vector or scalar 'rho' (relaxation parameter). At least one max-size constraint must be contained. For details, see buildConstraints. |
lambda |
a positive scalar denoting the overall strength of the constraints |
optim_method |
the method to evaluate the posterior distribution. Currently, only the option 'GA' (genetic algorithm) is supported. |
popsize |
size of the initial population of the genetic algorithm for model optimization |
maxiter |
maximum number of iterations of the genetic algorithm for model optimization |
shiny |
TRUE indicates that the function is called from Shiny dashboard |
... |
additional arguments |
Details
The function aggregates input parameters for UBayFS - including data, parameters defining ensemble and user knowledge and parameters specifying the optimization procedure - and trains the ensemble model.
Value
a 'UBaymodel' object containing the following list elements:
'data' - the input dataset
'target' - the input target
'lambda' - the input lambda value (constraint strength)
'prior_model' - the chosen prior model
'ensemble.params' - information about input and output of ensemble feature selection
'constraint.params' - parameters representing the constraints
‘user.params' - parameters representing the user’s prior knowledge
'optim.params' - optimization parameters
Examples
# build a UBayFS model using Breast Cancer Wisconsin dataset
data(bcw) # dataset
c <- buildConstraints(constraint_types = 'max_size',
constraint_vars = list(10),
num_elements = ncol(bcw$data),
rho = 1) # prior constraints
w <- rep(1, ncol(bcw$data)) # weights
model <- build.UBaymodel(
data = bcw$data,
target = bcw$labels,
M = 20,
constraints = c,
weights = w
)
# use a function computing a decision tree as input
library('rpart')
decision_tree <- function(X, y, n, name = 'tree'){
rf_data = as.data.frame(cbind(y, X))
colnames(rf_data) <- make.names(colnames(rf_data))
tree = rpart::rpart(y~., data = rf_data)
return(list(ranks= which(colnames(X) %in% names(tree$variable.importance)[1:n]),
name = name))
}
model <- build.UBaymodel(
data = bcw$data,
target = bcw$labels,
constraints = c,
weights = w,
method = decision_tree
)
# include block-constraints
c_block <- buildConstraints(constraint_types = 'max_size',
constraint_vars = list(2),
num_elements = length(bcw$blocks),
rho = 10,
block_list = bcw$blocks)
model <- setConstraints(model, c_block)