brif.default {brif} | R Documentation |
Build a model taking a data frame as input
Description
Build a model taking a data frame as input
Usage
## Default S3 method:
brif(
x,
n_numeric_cuts = 31,
n_integer_cuts = 31,
max_integer_classes = 20,
max_depth = 20,
min_node_size = 1,
ntrees = 200,
ps = 0,
max_factor_levels = 30,
seed = 0,
bagging_method = 0,
bagging_proportion = 0.9,
split_search = 4,
search_radius = 5,
verbose = 0,
nthreads = 2,
...
)
Arguments
x |
a data frame containing the training data set. The first column is taken as the target variable and all other columns are used as predictors. |
n_numeric_cuts |
an integer value indicating the maximum number of split points to generate for each numeric variable. |
n_integer_cuts |
an integer value indicating the maximum number of split points to generate for each integer variable. |
max_integer_classes |
an integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem. |
max_depth |
an integer specifying the maximum depth of each tree. Maximum is 40. |
min_node_size |
an integer specifying the minimum number of training cases a leaf node must contain. |
ntrees |
an integer specifying the number of trees in the forest. |
ps |
an integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input. |
max_factor_levels |
an integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended. |
seed |
an integer specifying the seed used by the internal random number generator. Default is 0, meaning not to set a seed but to accept the set seed from the calling environment. |
bagging_method |
an integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping). |
bagging_proportion |
a numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree. |
split_search |
an integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3. |
search_radius |
an positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above). |
verbose |
an integer (0 or 1) specifying the verbose level. |
nthreads |
an integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP. |
... |
additional arguments. |
Value
an object of class brif
, which is a list containing the following components. Note: this object is not intended for any use other than that by the function predict.brif
. Do not apply the str
function on this object because the output can be long and meaningless especially when ntrees is large. Use summary
to get a peek of its structure. Use printRules
to print out the decision rules of a particular tree. Most of the data in the object is stored in the tree_leaves element (which is a list of lists by itself) of this list.
p |
an integer scalar, the number of variables (predictors) used in the model |
var_types |
an character vector of length (p+1) containing the variable names, including the target variable name as its first element |
var_labels |
an character vector of length (p+1) containing the variable types, including that of the target variable as its first element |
n_bcols |
an integer vector of length (p+1), containing the numbers of binary columns generated for each variable |
ntrees |
an integer scalar indicating the number of trees in the model |
index_in_group |
an integer vector specifying the internal index, for each variable, in its type group |
numeric_cuts |
a list containing split point information on numeric variables |
integer_cuts |
a list containing split point information on integer variables |
factor_cuts |
a list containing split point information on factor variables |
n_num_vars |
an integer scalar indicating the numeric variables in the model |
n_int_vars |
an integer scalar indicating the integer variables in the model |
n_fac_vars |
an integer scalar indicating the factor variables in the model |
tree_leaves |
a list containing all the leaves in the forest |
yc |
a list containing the target variable encoding scheme |