xgb_train_offset {offsetreg}R Documentation

Boosted Poisson Trees with Offsets via xgboost

Description

xgb_train_offset() and xgb_predict_offset() are wrappers for xgboost tree-based models where all of the model arguments are in the main function. These functions are nearly identical to the parsnip functions parsnip::xgb_train() and parsnip::xg_predict_offset() except that the objective "count:poisson" is passed to xgboost::xgb.train() and an offset term is added to the data set.

Usage

xgb_train_offset(
  x,
  y,
  offset_col = "offset",
  weights = NULL,
  max_depth = 6,
  nrounds = 15,
  eta = 0.3,
  colsample_bynode = NULL,
  colsample_bytree = NULL,
  min_child_weight = 1,
  gamma = 0,
  subsample = 1,
  validation = 0,
  early_stop = NULL,
  counts = TRUE,
  ...
)

xgb_predict_offset(object, new_data, offset_col = "offset", ...)

Arguments

x

A data frame or matrix of predictors

y

A vector (numeric) or matrix (numeric) of outcome data.

offset_col

Character string. The name of a column in data containing offsets.

weights

A numeric vector of weights.

max_depth

An integer for the maximum depth of the tree.

nrounds

An integer for the number of boosting iterations.

eta

A numeric value between zero and one to control the learning rate.

colsample_bynode

Subsampling proportion of columns for each node within each tree. See the counts argument below. The default uses all columns.

colsample_bytree

Subsampling proportion of columns for each tree. See the counts argument below. The default uses all columns.

min_child_weight

A numeric value for the minimum sum of instance weights needed in a child to continue to split.

gamma

A number for the minimum loss reduction required to make a further partition on a leaf node of the tree

subsample

Subsampling proportion of rows. By default, all of the training data are used.

validation

The proportion of the data that are used for performance assessment and potential early stopping.

early_stop

An integer or NULL. If not NULL, it is the number of training iterations without improvement before stopping. If validation is used, performance is base on the validation set; otherwise, the training set is used.

counts

A logical. If FALSE, colsample_bynode and colsample_bytree are both assumed to be proportions of the proportion of columns affects (instead of counts).

...

Other options to pass to xgb.train() or xgboost's method for predict().

object

An xgboost object.

new_data

New data for predictions. Can be a data frame, matrix, xgb.DMatrix

Value

A fitted xgboost object.

Examples

us_deaths$off <- log(us_deaths$population)
x <- model.matrix(~ age_group + gender + off, us_deaths)[, -1]

mod <- xgb_train_offset(x, us_deaths$deaths, "off",
                        eta = 1, colsample_bynode = 1,
                        max_depth = 2, nrounds = 25,
                        counts = FALSE)

xgb_predict_offset(mod, x, "off")


[Package offsetreg version 1.1.0 Index]