glmtree {glmtree}R Documentation

Logistic regression tree by Stochastic-Expectation-Maximization

Description

This function produces a logistic regression tree: a decision tree with logistic regressions at its leaves.

Usage

glmtree(
  x,
  y,
  K = 10,
  iterations = 200,
  test = FALSE,
  validation = FALSE,
  proportions = c(0.3),
  criterion = "bic",
  ctree_controls = partykit::ctree_control(alpha = 0.1, minbucket = 100, maxdepth = 5)
)

Arguments

x

The features to use for prediction.

y

The binary / boolean labels to predict.

K

The number of segments to start with (maximum number of segments there'll be in the end).

iterations

The number of iterations to do in the SEM protocole (default: 200).

test

Boolean : True if the algorithm should use predictors to construct a test set on which to calculate the provided criterion using the best discretization scheme (chosen thanks to the provided criterion on either the test set (if true) or the training set (otherwise)) (default: TRUE).

validation

Boolean : True if the algorithm should use predictors to construct a validation set on which to search for the best discretization scheme using the provided criterion (default: TRUE).

proportions

The list of the proportions wanted for test and validation set. Not used when both test and validation are false. Only the first is used when there is only one of either test or validation that is set to TRUE. Produces an error when the sum is greater to one. Default: list(0.2,0.2) so that the training set has 0.6 of the input observations.

criterion

The criterion ('gini','aic','bic') to use to choose the best discretization scheme among the generated ones (default: 'gini'). Nota Bene: it is best to use 'gini' only when test is set to TRUE and 'aic' or 'bic' when it is not. When using 'aic' or 'bic' with a test set, the likelihood is returned as there is no need to penalize for generalization purposes.

ctree_controls

The controls to use for 'partykit::ctree'.

Value

An S4 object of class 'glmtree' that contains the parameters used to search for the logistic regression tree, the best tree it found, and its performance.

Author(s)

Adrien Ehrhardt

Examples

data <- generateData(n = 100, scenario = "no tree")
glmtree(x = data[, c("x1", "x2")], y = data$y, K = 5, iterations = 80, criterion = "aic")

[Package glmtree version 0.3.1 Index]