cvCovEst {cvCovEst}R Documentation

Cross-Validated Covariance Matrix Estimator Selector

Description

cvCovEst() identifies the optimal covariance matrix estimator from among a set of candidate estimators.

Usage

cvCovEst(
  dat,
  estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst),
  estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst =
    list(gamma = 0)),
  cv_loss = cvMatrixFrobeniusLoss,
  cv_scheme = "v_fold",
  mc_split = 0.5,
  v_folds = 10L,
  center = TRUE,
  scale = FALSE,
  parallel = FALSE,
  true_cov_mat = NULL
)

Arguments

dat

A numeric data.frame, matrix, or similar object.

estimators

A list of estimator functions to be considered in the cross-validated estimator selection procedure.

estimator_params

A named list of arguments corresponding to the hyperparameters of covariance matrix estimators in estimators. The name of each list element should match the name of an estimator passed to estimators. Each element of the estimator_params is itself a named list, with the names corresponding to a given estimator's hyperparameter(s). The hyperparameter(s) may be in the form of a single numeric or a numeric vector. If no hyperparameter is needed for a given estimator, then the estimator need not be listed.

cv_loss

A function indicating the loss function to be used. This defaults to the Frobenius loss, cvMatrixFrobeniusLoss(). An observation-based version, cvFrobeniusLoss(), is also made available. Additionally, the cvScaledMatrixFrobeniusLoss() is included for situations in which dat's variables are of different scales.

cv_scheme

A character indicating the cross-validation scheme to be employed. There are two options: (1) V-fold cross-validation, via "v_folds"; and (2) Monte Carlo cross-validation, via "mc". Defaults to Monte Carlo cross-validation.

mc_split

A numeric between 0 and 1 indicating the proportion of observations to be included in the validation set of each Monte Carlo cross-validation fold.

v_folds

An integer larger than or equal to 1 indicating the number of folds to use for cross-validation. The default is 10, regardless of the choice of cross-validation scheme.

center

A logical indicating whether to center the columns of dat to have mean zero.

scale

A logical indicating whether to scale the columns of dat to have unit variance.

parallel

A logical option indicating whether to run the main cross-validation loop with future_lapply(). This is passed directly to cross_validate().

true_cov_mat

A matrix-like object giving the true covariance matrix of the data-generating distribution, which is assumed Gaussian. This parameter is intended as an aid for use only in simulation studies, and it defaults to NULL. When not NULL, various conditional risk difference ratios of the estimator selected by cvCovEst() are computed relative to the different oracle selectors. NOTE: This parameter will be phased out by the release of version 1.0.0.

Value

A list of results containing the following elements:

If the true covariance matrix is input through true_cov_mat, then three additional elements are returned:

Examples

cvCovEst(
  dat = mtcars,
  estimators = c(
    linearShrinkLWEst, thresholdingEst, sampleCovEst
  ),
  estimator_params = list(
    thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
  ),
  center = TRUE,
  scale = TRUE
)

[Package cvCovEst version 0.3.5 Index]