audrex {audrex}R Documentation

audrex: Automatic Dynamic Regression using Extreme Gradient Boosting

Description

Dynamic regression for time series using Extreme Gradient Boosting with hyper-parameter tuning via Bayesian Optimization or Random Search.

Usage

audrex(
  data,
  n_sample = 10,
  n_search = 5,
  smoother = FALSE,
  seq_len = NULL,
  diff_threshold = 0.001,
  booster = "gbtree",
  norm = NULL,
  n_dim = NULL,
  ci = 0.8,
  min_set = 30,
  max_depth = NULL,
  eta = NULL,
  gamma = NULL,
  min_child_weight = NULL,
  subsample = NULL,
  colsample_bytree = NULL,
  lambda = NULL,
  alpha = NULL,
  n_windows = 3,
  patience = 0.1,
  nrounds = 100,
  dates = NULL,
  acq = "ucb",
  kappa = 2.576,
  eps = 0,
  kernel = list(type = "exponential", power = 2),
  seed = 42
)

Arguments

data

A data frame with time features on columns.

n_sample

Positive integer. Number of samples for the Bayesian Optimization. Default: 10.

n_search

Positive integer. Number of search steps for the Bayesian Optimization. When the parameter is set to 0, optimization is shifted to Random Search. Default: 5,

smoother

Logical. Perform optimal smoothing using standard loess. Default: FALSE

seq_len

Positive integer. Number of time-steps to be predicted. Default: NULL (automatic selection)

diff_threshold

Positive numeric. Minimum F-test threshold for differentiating each time feature (keep it low). Default: 0.001.

booster

String. Optimization methods available are: "gbtree", "gblinear". Default: "gbtree".

norm

Logical. Boolean flag to apply Yeo-Johson normalization. Default: NULL (automatic selection from random search or bayesian search).

n_dim

Positive integer. Projection of time features in a lower dimensional space with n_dim features. The default value (NULL) sets automatically the values in c(1, n features).

ci

Confidence interval. Default: 0.8.

min_set

Positive integer. Minimun number for validation set in case of automatic resize of past dimension. Default: 30.

max_depth

Positive integer. Look to xgboost documentation for description. A vector with one or two positive integer for the search boundaries. The default value (NULL) sets automatically the values in c(1, 8).

eta

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric between (0, 1] for the search boundaries. The default value (NULL) sets automatically the values in c(0, 1).

gamma

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric for the search boundaries. The default value (NULL) sets automatically the values in c(0, 100).

min_child_weight

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric for the search boundaries. The default value (NULL) sets automatically the values in c(0, 100).

subsample

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric between (0, 1] for the search boundaries. The default value (NULL) sets automatically the values in c(0, 1).

colsample_bytree

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric between (0, 1] for the search boundaries. The default value (NULL) sets automatically the values in c(0, 1).

lambda

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric for the search boundaries. The default value (NULL) sets automatically the values in c(0, 100).

alpha

Positive numeric. Look to xgboost documentation for description. A vector with one or two positive numeric for the search boundaries. The default value (NULL) sets automatically the values in c(0, 100).

n_windows

Positive integer. Number of (expanding) windows for cross-validation. Default: 3.

patience

Positive numeric. Percentage of waiting rounds without improvement before xgboost stops. Default: 0.1

nrounds

Positive numeric. Number of round for the extreme boosting machine. Look to xgboost for description. Default: 100.

dates

Date. Vector of dates for the time series. Default: NULL (progressive numbers).

acq

String. Parameter for Bayesian Optimization. For reference see rBayesianOptimization documentation. Default: "ucb".

kappa

Positive numeric. Parameter for Bayesian Optimization. For reference see rBayesianOptimization documentation. Default: 2.576.

eps

Positive numeric. Parameter for Bayesian Optimization. For reference see rBayesianOptimization documentation. Default: 0.

kernel

List. Parameter for Bayesian Optimization. For reference see rBayesianOptimization documentation. Default: list(type = "exponential", power = 2).

seed

Random seed. Default: 42.

Value

This function returns a list including:

Author(s)

Giancarlo Vercellino giancarlo.vercellino@gmail.com

See Also

Useful links:

Examples


audrex(covid_in_europe[, 2:5], n_samp = 3, n_search = 2, seq_len = 10) ### BAYESIAN OPTIMIZATION
audrex(covid_in_europe[, 2:5], n_samp = 5, n_search = 0, seq_len = 10) ### RANDOM SEARCH




[Package audrex version 2.0.1 Index]