glmtree {glmtree} | R Documentation |
Logistic regression tree by Stochastic-Expectation-Maximization
Description
This function produces a logistic regression tree: a decision tree with logistic regressions at its leaves.
Usage
glmtree(
x,
y,
K = 10,
iterations = 200,
test = FALSE,
validation = FALSE,
proportions = c(0.3),
criterion = "bic",
ctree_controls = partykit::ctree_control(alpha = 0.1, minbucket = 100, maxdepth = 5)
)
Arguments
x |
The features to use for prediction. |
y |
The binary / boolean labels to predict. |
K |
The number of segments to start with (maximum number of segments there'll be in the end). |
iterations |
The number of iterations to do in the SEM protocole (default: 200). |
test |
Boolean : True if the algorithm should use predictors to construct a test set on which to calculate the provided criterion using the best discretization scheme (chosen thanks to the provided criterion on either the test set (if true) or the training set (otherwise)) (default: TRUE). |
validation |
Boolean : True if the algorithm should use predictors to construct a validation set on which to search for the best discretization scheme using the provided criterion (default: TRUE). |
proportions |
The list of the proportions wanted for test and validation set. Not used when both test and validation are false. Only the first is used when there is only one of either test or validation that is set to TRUE. Produces an error when the sum is greater to one. Default: list(0.2,0.2) so that the training set has 0.6 of the input observations. |
criterion |
The criterion ('gini','aic','bic') to use to choose the best discretization scheme among the generated ones (default: 'gini'). Nota Bene: it is best to use 'gini' only when test is set to TRUE and 'aic' or 'bic' when it is not. When using 'aic' or 'bic' with a test set, the likelihood is returned as there is no need to penalize for generalization purposes. |
ctree_controls |
The controls to use for 'partykit::ctree'. |
Value
An S4 object of class 'glmtree' that contains the parameters used to search for the logistic regression tree, the best tree it found, and its performance.
Author(s)
Adrien Ehrhardt
Examples
data <- generateData(n = 100, scenario = "no tree")
glmtree(x = data[, c("x1", "x2")], y = data$y, K = 5, iterations = 80, criterion = "aic")