R: copulaboost

copulaboost {copulaboost}

R Documentation

copulaboost

Description

This is the main function of the package, which fits an additive model with a fixed number of components, each involving a fixed number of covariates, where each component is a copula regression model.

Usage

copulaboost(
  y,
  x,
  cov_types,
  n_models = 100,
  n_covs = 5,
  learning_rate = 0.33,
  eps = 0.05,
  verbose = FALSE,
  cont_method = "Localmedian",
  family_set = c("gaussian", "clayton", "gumbel"),
  jitter_sel = TRUE,
  ml_update = FALSE,
  ml_sel = FALSE,
  max_ml_scale = 1,
  keep_sel_struct = TRUE,
  approx_order = 2,
  parametric_margs = TRUE,
  parallel = FALSE,
  par_method_sel = "itau",
  update_intercept = TRUE,
  model = NULL,
  xtreme = FALSE
)

Arguments

`y`	A vector of n observations of the (univariate) binary outcome variable y
`x`	A (n x p) matrix of n observations of p covariates
`cov_types`	A vector of p characters that have to take the value "c" or "d" to indicate whether each margin of the covariates is discrete or continuous.
`n_models`	The number of model components to fit.
`n_covs`	The number of covariates included in each component.
`learning_rate`	Factor to scale (down) the each component.
`eps`	Control parameter for the approximation to the conditional expectation (the prediction) for each copula model (component), which splits the interval [-1, 1] into equal pieces of eps length.
`verbose`	Logical indicator of whether a progressbar should be shown in the terminal.
`cont_method`	Method to use for the approximation of each conditional expectation, can either be "Localmedian" or "Trapezoidalsurv", for the former, see section 3.2 of https://arxiv.org/ftp/arxiv/papers/2208/2208.04669.pdf. The latter uses the so called "Darth vader rule" in conjuction with a simple translative transformation to write the conditional expectation as an integral along the conditional survival function, which is then approximated by the trapezoidal method.
`family_set`	A vector of strings that specifies the set of pair-copula families that the fitting algorithm chooses from. For an overview of which values that can be specified, see the documentation for bicop.
`jitter_sel`	Logical indicator of whether jittering should be used for any discrete covariates when selecting the variables for each component (improves computational speed).
`ml_update`	Logical indicator of whether each new component should be scaled by a number between 0 and max_ml_scale by maximising the log-likelihood of the scaling factor given the current model and the new component.
`ml_sel`	The same as ml_update, but for the variable selection algorithm.
`max_ml_scale`	The maximum scaling factor allowed for each component.
`keep_sel_struct`	Logical indicator of whether the d-vine structures found by the model selection algorithm should be kept when fitting the components.
`approx_order`	The order of the approximation used for evaluating the conditional expectations when selecting covariates for each component. The allowed values for approx_order are 1, 2, 3, 4, 5, and 6.
`parametric_margs`	Logical indicator of whether parametric (gaussian or bernoulli) models should be used for the marginal distributions of the covariates.
`parallel`	(Experimental) Logical indicator of whether parallelization should be used when selecting covariates.
`par_method_sel`	Estimation method for copulas used when selecting the model components, either "itau" or "mle", see the documentation for bicop.
`update_intercept`	Logical indicator of whether the intercept parameter should be updated (by univariate maximum likelihood) after each component is added.
`model`	Initial copulaboost-model. If model is a copulaboost model with k components, the resulting model will have k + n_models components.
`xtreme`	(Experimental) Logical indicator of whether a second order expansion of the log-likelihood should be used in each gradient boosting step, similar to the xgboost algorithm.

Value

A copulaboost object, which contains a nested list 'object$model' which contains all of the model components. The first element of each list contains a copulareg object, and the second element contains a vector listing the indexes of the covariates that are a part of the component. The object also contains a list of the updated intercepts 'object$f0_updated' at each stage of the fitting process, so that the j-th intercept is the intercept for the model that is the weighted sum of the j first components. 'object$scaling' contains a vector of weights for each components, equal to the learning rate, possibly multiplied by an individual factor if ml_update = TRUE. In addition the object contains the values of the arguments learning_rate, cov_types, and eps that where used when calling copulaboost().

Examples

# Compile some test data
data('ChickWeight')
set.seed(10)
tr <- sample(c(TRUE, FALSE), nrow(ChickWeight), TRUE, c(0.7, 0.3))
y_tr <- as.numeric(ChickWeight$weight[tr] > 100)
y_te <- as.numeric(ChickWeight$weight[!tr] > 100)
x_tr <- apply(ChickWeight[tr, -1], 2, as.numeric)
x_te <- apply(ChickWeight[!tr, -1], 2, as.numeric)
cov_types <- apply(x_tr, 2,
                   function(x) if(length(unique(x)) < 10) "d" else "c")

# Fit model to training data
md <- copulaboost::copulaboost(y_tr, x_tr, cov_types, n_covs = 2, 
                               n_models = 5, verbose = TRUE)

# Out of sample predictions for a new data matrix
preds <- predict(md, new_x = x_te, all_parts = TRUE)

# Plot log-likelihood
plot(apply(preds, 2,
           function(eta) {
             sum(stats::dbinom(y_te, 1, stats::plogis(eta), log = TRUE))
             }),
     type = "s")

[Package copulaboost version 0.1.0 Index]