copulaboost {copulaboost} | R Documentation |
copulaboost
Description
This is the main function of the package, which fits an additive model with a fixed number of components, each involving a fixed number of covariates, where each component is a copula regression model.
Usage
copulaboost(
y,
x,
cov_types,
n_models = 100,
n_covs = 5,
learning_rate = 0.33,
eps = 0.05,
verbose = FALSE,
cont_method = "Localmedian",
family_set = c("gaussian", "clayton", "gumbel"),
jitter_sel = TRUE,
ml_update = FALSE,
ml_sel = FALSE,
max_ml_scale = 1,
keep_sel_struct = TRUE,
approx_order = 2,
parametric_margs = TRUE,
parallel = FALSE,
par_method_sel = "itau",
update_intercept = TRUE,
model = NULL,
xtreme = FALSE
)
Arguments
y |
A vector of n observations of the (univariate) binary outcome variable y |
x |
A (n x p) matrix of n observations of p covariates |
cov_types |
A vector of p characters that have to take the value "c" or "d" to indicate whether each margin of the covariates is discrete or continuous. |
n_models |
The number of model components to fit. |
n_covs |
The number of covariates included in each component. |
learning_rate |
Factor to scale (down) the each component. |
eps |
Control parameter for the approximation to the conditional expectation (the prediction) for each copula model (component), which splits the interval [-1, 1] into equal pieces of eps length. |
verbose |
Logical indicator of whether a progressbar should be shown in the terminal. |
cont_method |
Method to use for the approximation of each conditional expectation, can either be "Localmedian" or "Trapezoidalsurv", for the former, see section 3.2 of https://arxiv.org/ftp/arxiv/papers/2208/2208.04669.pdf. The latter uses the so called "Darth vader rule" in conjuction with a simple translative transformation to write the conditional expectation as an integral along the conditional survival function, which is then approximated by the trapezoidal method. |
family_set |
A vector of strings that specifies the set of pair-copula families that the fitting algorithm chooses from. For an overview of which values that can be specified, see the documentation for bicop. |
jitter_sel |
Logical indicator of whether jittering should be used for any discrete covariates when selecting the variables for each component (improves computational speed). |
ml_update |
Logical indicator of whether each new component should be scaled by a number between 0 and max_ml_scale by maximising the log-likelihood of the scaling factor given the current model and the new component. |
ml_sel |
The same as ml_update, but for the variable selection algorithm. |
max_ml_scale |
The maximum scaling factor allowed for each component. |
keep_sel_struct |
Logical indicator of whether the d-vine structures found by the model selection algorithm should be kept when fitting the components. |
approx_order |
The order of the approximation used for evaluating the conditional expectations when selecting covariates for each component. The allowed values for approx_order are 1, 2, 3, 4, 5, and 6. |
parametric_margs |
Logical indicator of whether parametric (gaussian or bernoulli) models should be used for the marginal distributions of the covariates. |
parallel |
(Experimental) Logical indicator of whether parallelization should be used when selecting covariates. |
par_method_sel |
Estimation method for copulas used when selecting the model components, either "itau" or "mle", see the documentation for bicop. |
update_intercept |
Logical indicator of whether the intercept parameter should be updated (by univariate maximum likelihood) after each component is added. |
model |
Initial copulaboost-model. If model is a copulaboost model with k components, the resulting model will have k + n_models components. |
xtreme |
(Experimental) Logical indicator of whether a second order expansion of the log-likelihood should be used in each gradient boosting step, similar to the xgboost algorithm. |
Value
A copulaboost object, which contains a nested list 'object$model' which contains all of the model components. The first element of each list contains a copulareg object, and the second element contains a vector listing the indexes of the covariates that are a part of the component. The object also contains a list of the updated intercepts 'object$f0_updated' at each stage of the fitting process, so that the j-th intercept is the intercept for the model that is the weighted sum of the j first components. 'object$scaling' contains a vector of weights for each components, equal to the learning rate, possibly multiplied by an individual factor if ml_update = TRUE. In addition the object contains the values of the arguments learning_rate, cov_types, and eps that where used when calling copulaboost().
Examples
# Compile some test data
data('ChickWeight')
set.seed(10)
tr <- sample(c(TRUE, FALSE), nrow(ChickWeight), TRUE, c(0.7, 0.3))
y_tr <- as.numeric(ChickWeight$weight[tr] > 100)
y_te <- as.numeric(ChickWeight$weight[!tr] > 100)
x_tr <- apply(ChickWeight[tr, -1], 2, as.numeric)
x_te <- apply(ChickWeight[!tr, -1], 2, as.numeric)
cov_types <- apply(x_tr, 2,
function(x) if(length(unique(x)) < 10) "d" else "c")
# Fit model to training data
md <- copulaboost::copulaboost(y_tr, x_tr, cov_types, n_covs = 2,
n_models = 5, verbose = TRUE)
# Out of sample predictions for a new data matrix
preds <- predict(md, new_x = x_te, all_parts = TRUE)
# Plot log-likelihood
plot(apply(preds, 2,
function(eta) {
sum(stats::dbinom(y_te, 1, stats::plogis(eta), log = TRUE))
}),
type = "s")