R: Build a model taking a data frame as input

brif.default {brif}

R Documentation

Build a model taking a data frame as input

Description

Build a model taking a data frame as input

Usage

## Default S3 method:
brif(
  x,
  n_numeric_cuts = 31,
  n_integer_cuts = 31,
  max_integer_classes = 20,
  max_depth = 20,
  min_node_size = 1,
  ntrees = 200,
  ps = 0,
  max_factor_levels = 30,
  seed = 0,
  bagging_method = 0,
  bagging_proportion = 0.9,
  split_search = 4,
  search_radius = 5,
  verbose = 0,
  nthreads = 2,
  ...
)

Arguments

`x`	a data frame containing the training data set. The first column is taken as the target variable and all other columns are used as predictors.
`n_numeric_cuts`	an integer value indicating the maximum number of split points to generate for each numeric variable.
`n_integer_cuts`	an integer value indicating the maximum number of split points to generate for each integer variable.
`max_integer_classes`	an integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem.
`max_depth`	an integer specifying the maximum depth of each tree. Maximum is 40.
`min_node_size`	an integer specifying the minimum number of training cases a leaf node must contain.
`ntrees`	an integer specifying the number of trees in the forest.
`ps`	an integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input.
`max_factor_levels`	an integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended.
`seed`	an integer specifying the seed used by the internal random number generator. Default is 0, meaning not to set a seed but to accept the set seed from the calling environment.
`bagging_method`	an integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping).
`bagging_proportion`	a numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree.
`split_search`	an integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3.
`search_radius`	an positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above).
`verbose`	an integer (0 or 1) specifying the verbose level.
`nthreads`	an integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP.
`...`	additional arguments.

Value

an object of class brif, which is a list containing the following components. Note: this object is not intended for any use other than that by the function predict.brif. Do not apply the str function on this object because the output can be long and meaningless especially when ntrees is large. Use summary to get a peek of its structure. Use printRules to print out the decision rules of a particular tree. Most of the data in the object is stored in the tree_leaves element (which is a list of lists by itself) of this list.

`p`	an integer scalar, the number of variables (predictors) used in the model
`var_types`	an character vector of length (p+1) containing the variable names, including the target variable name as its first element
`var_labels`	an character vector of length (p+1) containing the variable types, including that of the target variable as its first element
`n_bcols`	an integer vector of length (p+1), containing the numbers of binary columns generated for each variable
`ntrees`	an integer scalar indicating the number of trees in the model
`index_in_group`	an integer vector specifying the internal index, for each variable, in its type group
`numeric_cuts`	a list containing split point information on numeric variables
`integer_cuts`	a list containing split point information on integer variables
`factor_cuts`	a list containing split point information on factor variables
`n_num_vars`	an integer scalar indicating the numeric variables in the model
`n_int_vars`	an integer scalar indicating the integer variables in the model
`n_fac_vars`	an integer scalar indicating the factor variables in the model
`tree_leaves`	a list containing all the leaves in the forest
`yc`	a list containing the target variable encoding scheme

[Package brif version 1.4.1 Index]