fit_copula_interactions {LogisticCopula}R Documentation

fit_copula_interactions

Description

This is the main function of the package, which starting from an initial logistic regression model with only main effects of each covariate, selects and fits interaction terms in the form of two R-vine models with identical graphical structure, one for each class.

Usage

fit_copula_interactions(
  y,
  x,
  xtype,
  family_set = c("gaussian", "clayton", "gumbel"),
  oos_validation = FALSE,
  tau = 2,
  which_include = NULL,
  reg.method = "glm",
  maxit_final = 1000,
  maxit_intermediate = 50,
  verbose = FALSE,
  adjust_intercept = TRUE,
  max_t = Inf,
  test_x = NULL,
  test_y = NULL,
  set_nonsig_zero = FALSE,
  reltol = sqrt(.Machine$double.eps)
)

Arguments

y

A vector of n observations of the (univariate) binary outcome variable y

x

A (n x p) matrix of n observations of p covariates

xtype

A vector of p characters that have to take the value "c_a", "c_p", "d_b" or "d_b", to indicate whether each margin of the is continuous with full support, continuous with support on the positive real line, discrete (binary) or a counting variable.

family_set

A vector of strings that specifies the set of pair-copula families that the fitting algorithm chooses from. For an overview of which values that can be specified, see the documentation for bicop.

oos_validation

Whether to use an external sample for validation instead of an in-sample likelihood based criteria. Would require that both test_x and test_y are provided if set to TRUE.

tau

Parameter used when selecting the structure, where the the criteria is (new_likelihood - previous_likelihood - tau), so that an additional edge in the copulas is only accepted if it leads to an increase in the likelihood that exceeds tau. Setting tau to NULL, has the same effect as -Inf.

which_include

The column indices of the covariates that could be included in the copula effects.

reg.method

The method by which the initial regression coefficients are fitted.

maxit_final

The maximum number of gradient optimisation iterations to use when the full structure has been selected to refit all the parameters. Defaults to 1000.

maxit_intermediate

The maximum number of gradient optimisation iterations to use when adding a newly selected component to refit the parameters. Defaults to 10.

verbose

Whether information about the progress should be printed to the console.

adjust_intercept

Whether to intermediately refit the intercept during the model/structure selection procedure. Defaults to true.

max_t

The maximum number of trees in the copula models. Defaults to Inf, i.e., no maximum.

test_x

Part of the optional validation set, see @oos_validation.

test_y

Part of the optional validation set, see @oos_validation.

set_nonsig_zero

If true, non-significant regression coefficients (in the initial glm model) will be set to zero

reltol

Relative convergence tolerance, see the documentation for optim.

Value

A logistic_copula object, which contains the regression coefficients of the model, the parameters of the chosen conditional covariate distribution that corresponds to the regression coefficients, and the pair of vine-models that extend the logistic regression model.

Examples

data("Ionosphere")

dset <- Ionosphere[, -(1:2)] 

set.seed(20)
rowss <- sample(nrow(dset), round(nrow(dset) * 0.75))
colss <- sample(ncol(dset) - 1, 5)
x <- as.matrix(dset[rowss, colss])
xte <- as.matrix(dset[-rowss, colss])
y <- dset[rowss, ncol(dset)] == "bad"
yte <- dset[-rowss, ncol(dset)] == "bad"

xtype <- apply(x, 2, function(x) if(length(unique(x)) > 2) "c_a" else "d")

# Model with selection penalty tau=log(n)
md <- LogisticCopula::fit_copula_interactions(
  y, as.matrix(x), xtype, tau = log(nrow(x))
)
# Model with selection penalty tau=Inf, returns just the logistic
# regression model
mdglm <- LogisticCopula::fit_copula_interactions(
  y, as.matrix(x), xtype, tau = Inf
)

plot(predict(mdglm, xte), predict(md, xte), col = 3 + yte)

[Package LogisticCopula version 0.1.0 Index]