copulaboost {copulaboost} | R Documentation |
This is the main function of the package, which fits an additive model with a fixed number of components, each involving a fixed number of covariates, where each component is a copula regression model.
n_models = 100,
n_covs = 5,
learning_rate = 0.33,
eps = 0.05,
verbose = FALSE,
cont_method = "Localmedian",
family_set = c("gaussian", "clayton", "gumbel"),
jitter_sel = TRUE,
ml_update = FALSE,
ml_sel = FALSE,
max_ml_scale = 1,
keep_sel_struct = TRUE,
approx_order = 2,
parametric_margs = TRUE,
parallel = FALSE,
par_method_sel = "itau",
update_intercept = TRUE,
model = NULL,
xtreme = FALSE
y |
A vector of n observations of the (univariate) binary outcome variable y |
x |
A (n x p) matrix of n observations of p covariates |
cov_types |
A vector of p characters that have to take the value "c" or "d" to indicate whether each margin of the covariates is discrete or continuous. |
n_models |
The number of model components to fit. |
n_covs |
The number of covariates included in each component. |
learning_rate |
Factor to scale (down) the each component. |
eps |
Control parameter for the approximation to the conditional expectation (the prediction) for each copula model (component), which splits the interval [-1, 1] into equal pieces of eps length. |
verbose |
Logical indicator of whether a progressbar should be shown in the terminal. |
cont_method |
Method to use for the approximation of each conditional expectation, can either be "Localmedian" or "Trapezoidalsurv", for the former, see section 3.2 of The latter uses the so called "Darth vader rule" in conjuction with a simple translative transformation to write the conditional expectation as an integral along the conditional survival function, which is then approximated by the trapezoidal method. |
family_set |
A vector of strings that specifies the set of pair-copula families that the fitting algorithm chooses from. For an overview of which values that can be specified, see the documentation for bicop. |
jitter_sel |
Logical indicator of whether jittering should be used for any discrete covariates when selecting the variables for each component (improves computational speed). |
ml_update |
Logical indicator of whether each new component should be scaled by a number between 0 and max_ml_scale by maximising the log-likelihood of the scaling factor given the current model and the new component. |
ml_sel |
The same as ml_update, but for the variable selection algorithm. |
max_ml_scale |
The maximum scaling factor allowed for each component. |
keep_sel_struct |
Logical indicator of whether the d-vine structures found by the model selection algorithm should be kept when fitting the components. |
approx_order |
The order of the approximation used for evaluating the conditional expectations when selecting covariates for each component. The allowed values for approx_order are 1, 2, 3, 4, 5, and 6. |
parametric_margs |
Logical indicator of whether parametric (gaussian or bernoulli) models should be used for the marginal distributions of the covariates. |
parallel |
(Experimental) Logical indicator of whether parallelization should be used when selecting covariates. |
par_method_sel |
Estimation method for copulas used when selecting the model components, either "itau" or "mle", see the documentation for bicop. |
update_intercept |
Logical indicator of whether the intercept parameter should be updated (by univariate maximum likelihood) after each component is added. |
model |
Initial copulaboost-model. If model is a copulaboost model with k components, the resulting model will have k + n_models components. |
xtreme |
(Experimental) Logical indicator of whether a second order expansion of the log-likelihood should be used in each gradient boosting step, similar to the xgboost algorithm. |
A copulaboost object, which contains a nested list 'object$model' which contains all of the model components. The first element of each list contains a copulareg object, and the second element contains a vector listing the indexes of the covariates that are a part of the component. The object also contains a list of the updated intercepts 'object$f0_updated' at each stage of the fitting process, so that the j-th intercept is the intercept for the model that is the weighted sum of the j first components. 'object$scaling' contains a vector of weights for each components, equal to the learning rate, possibly multiplied by an individual factor if ml_update = TRUE. In addition the object contains the values of the arguments learning_rate, cov_types, and eps that where used when calling copulaboost().
# Compile some test data
tr <- sample(c(TRUE, FALSE), nrow(ChickWeight), TRUE, c(0.7, 0.3))
y_tr <- as.numeric(ChickWeight$weight[tr] > 100)
y_te <- as.numeric(ChickWeight$weight[!tr] > 100)
x_tr <- apply(ChickWeight[tr, -1], 2, as.numeric)
x_te <- apply(ChickWeight[!tr, -1], 2, as.numeric)
cov_types <- apply(x_tr, 2,
function(x) if(length(unique(x)) < 10) "d" else "c")
# Fit model to training data
md <- copulaboost::copulaboost(y_tr, x_tr, cov_types, n_covs = 2,
n_models = 5, verbose = TRUE)
# Out of sample predictions for a new data matrix
preds <- predict(md, new_x = x_te, all_parts = TRUE)
# Plot log-likelihood
plot(apply(preds, 2,
function(eta) {
sum(stats::dbinom(y_te, 1, stats::plogis(eta), log = TRUE))
type = "s")