setartree {setartree}R Documentation

Fitting SETAR-Tree models

Description

Fits a SETAR-Tree model either using a list of time series or an embedded input matrix and labels.

Usage

setartree(
  data,
  label = NULL,
  lag = 10,
  depth = 1000,
  significance = 0.05,
  significance_divider = 2,
  error_threshold = 0.03,
  stopping_criteria = "both",
  mean_normalisation = FALSE,
  window_normalisation = FALSE,
  verbose = 2,
  categorical_covariates = NULL
)

Arguments

data

A list of time series (each list element is a separate time series) or a dataframe/matrix containing model inputs (the columns can contain past time series lags and/or external numerical/categorical covariates).

label

A vector of true outputs. This parameter is only required when data is a dataframe/matrix containing the model inputs.

lag

The number of past time series lags that should be used when fitting the SETAR-Tree. This parameter is only required when data is a list of time series. Default value is 10.

depth

Maximum tree depth. Default value is 1000. Thus, unless specify a lower value, the depth is actually controlled by the stopping criterion.

significance

Initial significance used by the linearity test (alpha_0). Default value is 0.05.

significance_divider

The corresponding significance in each tree level is divided by this value. Default value is 2.

error_threshold

The minimum error reduction percentage between parent and child nodes to make a split. Default value is 0.03.

stopping_criteria

The required stopping criteria: linearity test (lin_test), error reduction percentage (error_imp) or linearity test and error reduction percentage (both). Default value is "both".

mean_normalisation

Whether each series should be normalised by deducting its mean value before building the tree. This parameter is only required when data is a list of time series. Default value is FALSE.

window_normalisation

Whether the window-wise normalisation should be applied before building the tree. This parameter is only required when data is a list of time series. When this is TRUE, each row of the training embedded matrix is normalised by deducting its mean value before building the tree. Default value is FALSE.

verbose

Controls the level of the verbosity of SETAR-Tree: 0 (errors/warnings), 1 (limited amount of information including the current tree depth), 2 (full training information including the current tree depth and stopping criterion results in each tree node). Default value is 2.

categorical_covariates

Names of the categorical covariates in the input data. This parameter is only required when data is a dataframe/matrix and it contains categorical variables.

Value

An object of class setartree which contains the following properties.

leaf_models

Trained global pooled regression models in each leaf node.

opt_lags

Optimal features used to split each node.

opt_thresholds

Optimal threshold values used to split each node.

lag

The number of features used to train the SETAR-Tree.

feature_names

Names of the input features.

coefficients

Names of the coefficients of leaf node regresion models.

num_leaves

Number of leaf nodes in the SETAR-Tree.

depth

Depth of the SETAR-Tree which was determined based on the specified stopping criterion.

leaf_instance_dis

Number of instances used to train the regression models at each leaf node.

stds

The standard deviations of the residuals of each leaf node.

categorical_covariate_values

Information about the categorical covarites used during training (only if applicable).

mean_normalisation

Whether mean normalisation was applied for the training data.

window_normalisation

Whether window normalisation was applied for the training data.

input_type

Type of input data used to train the SETAR-Tree. This is list if data is a list of time series, and df if data is a dataframe/matrix containing model inputs.

execution_time

Execution time of SETAR-Tree.

Examples


# Training SETAR-Tree with a list of time series
setartree(chaotic_logistic_series)

# Training SETAR-Tree with a dataframe containing model inputs where the model inputs may contain
# past time series lags and numerical/categorical covariates
setartree(data = web_traffic_train[,-1],
          label = web_traffic_train[,1],
          stopping_criteria = "both",
          categorical_covariates = "Project")



[Package setartree version 0.2.1 Index]