copulaboost {copulaboost}R Documentation

copulaboost

Description

This is the main function of the package, which fits an additive model with a fixed number of components, each involving a fixed number of covariates, where each component is a copula regression model.

Usage

copulaboost(
  y,
  x,
  cov_types,
  n_models = 100,
  n_covs = 5,
  learning_rate = 0.33,
  eps = 0.05,
  verbose = FALSE,
  cont_method = "Localmedian",
  family_set = c("gaussian", "clayton", "gumbel"),
  jitter_sel = TRUE,
  ml_update = FALSE,
  ml_sel = FALSE,
  max_ml_scale = 1,
  keep_sel_struct = TRUE,
  approx_order = 2,
  parametric_margs = TRUE,
  parallel = FALSE,
  par_method_sel = "itau",
  update_intercept = TRUE,
  model = NULL,
  xtreme = FALSE
)

Arguments

y

A vector of n observations of the (univariate) binary outcome variable y

x

A (n x p) matrix of n observations of p covariates

cov_types

A vector of p characters that have to take the value "c" or "d" to indicate whether each margin of the covariates is discrete or continuous.

n_models

The number of model components to fit.

n_covs

The number of covariates included in each component.

learning_rate

Factor to scale (down) the each component.

eps

Control parameter for the approximation to the conditional expectation (the prediction) for each copula model (component), which splits the interval [-1, 1] into equal pieces of eps length.

verbose

Logical indicator of whether a progressbar should be shown in the terminal.

cont_method

Method to use for the approximation of each conditional expectation, can either be "Localmedian" or "Trapezoidalsurv", for the former, see section 3.2 of https://arxiv.org/ftp/arxiv/papers/2208/2208.04669.pdf. The latter uses the so called "Darth vader rule" in conjuction with a simple translative transformation to write the conditional expectation as an integral along the conditional survival function, which is then approximated by the trapezoidal method.

family_set

A vector of strings that specifies the set of pair-copula families that the fitting algorithm chooses from. For an overview of which values that can be specified, see the documentation for bicop.

jitter_sel

Logical indicator of whether jittering should be used for any discrete covariates when selecting the variables for each component (improves computational speed).

ml_update

Logical indicator of whether each new component should be scaled by a number between 0 and max_ml_scale by maximising the log-likelihood of the scaling factor given the current model and the new component.

ml_sel

The same as ml_update, but for the variable selection algorithm.

max_ml_scale

The maximum scaling factor allowed for each component.

keep_sel_struct

Logical indicator of whether the d-vine structures found by the model selection algorithm should be kept when fitting the components.

approx_order

The order of the approximation used for evaluating the conditional expectations when selecting covariates for each component. The allowed values for approx_order are 1, 2, 3, 4, 5, and 6.

parametric_margs

Logical indicator of whether parametric (gaussian or bernoulli) models should be used for the marginal distributions of the covariates.

parallel

(Experimental) Logical indicator of whether parallelization should be used when selecting covariates.

par_method_sel

Estimation method for copulas used when selecting the model components, either "itau" or "mle", see the documentation for bicop.

update_intercept

Logical indicator of whether the intercept parameter should be updated (by univariate maximum likelihood) after each component is added.

model

Initial copulaboost-model. If model is a copulaboost model with k components, the resulting model will have k + n_models components.

xtreme

(Experimental) Logical indicator of whether a second order expansion of the log-likelihood should be used in each gradient boosting step, similar to the xgboost algorithm.

Value

A copulaboost object, which contains a nested list 'object$model' which contains all of the model components. The first element of each list contains a copulareg object, and the second element contains a vector listing the indexes of the covariates that are a part of the component. The object also contains a list of the updated intercepts 'object$f0_updated' at each stage of the fitting process, so that the j-th intercept is the intercept for the model that is the weighted sum of the j first components. 'object$scaling' contains a vector of weights for each components, equal to the learning rate, possibly multiplied by an individual factor if ml_update = TRUE. In addition the object contains the values of the arguments learning_rate, cov_types, and eps that where used when calling copulaboost().

Examples

# Compile some test data
data('ChickWeight')
set.seed(10)
tr <- sample(c(TRUE, FALSE), nrow(ChickWeight), TRUE, c(0.7, 0.3))
y_tr <- as.numeric(ChickWeight$weight[tr] > 100)
y_te <- as.numeric(ChickWeight$weight[!tr] > 100)
x_tr <- apply(ChickWeight[tr, -1], 2, as.numeric)
x_te <- apply(ChickWeight[!tr, -1], 2, as.numeric)
cov_types <- apply(x_tr, 2,
                   function(x) if(length(unique(x)) < 10) "d" else "c")

# Fit model to training data
md <- copulaboost::copulaboost(y_tr, x_tr, cov_types, n_covs = 2, 
                               n_models = 5, verbose = TRUE)

# Out of sample predictions for a new data matrix
preds <- predict(md, new_x = x_te, all_parts = TRUE)

# Plot log-likelihood
plot(apply(preds, 2,
           function(eta) {
             sum(stats::dbinom(y_te, 1, stats::plogis(eta), log = TRUE))
             }),
     type = "s")


[Package copulaboost version 0.1.0 Index]