mmcd {robustmatrix}R Documentation

The Matrix Minimum Covariance Determinant (MMCD) Estimator

Description

mmcd computes the robust MMCD estimators of location and covariance for matrix-variate data using the FastMMCD algorithm (double-blind 2024).

Usage

mmcd(
  X,
  nsamp = 500L,
  alpha = 0.5,
  lambda = 0,
  max_iter_cstep = 100L,
  max_iter_MLE = 100L,
  max_iter_cstep_init = 2L,
  max_iter_MLE_init = 2L,
  adapt_alpha = TRUE,
  reweight = TRUE,
  scale_consistency = "quant",
  outlier_quant = 0.975,
  nthreads = 1L
)

Arguments

X

a 3d array of dimension (p,q,n), containing n matrix-variate samples of p rows and q columns in each slice.

nsamp

number of initial h-subsets (default is 500).

alpha

numeric parameter between 0.5 (default) and 1. Controls the size h \approx alpha * n of the h-subset over which the determinant is minimized.

lambda

a smooting parameter for the rowwise and columnwise covariance matrices.

max_iter_cstep

upper limit of C-step iterations (default is 100)

max_iter_MLE

upper limit of MLE iterations (default is 100)

max_iter_cstep_init

upper limit of C-step iterations for initial h-subsets (default is 2)

max_iter_MLE_init

upper limit of MLE iterations for initial h-subsets (default is 2)

adapt_alpha

Logical. If TRUE (default) alpha is adapted to take the dimension of the data into account.

reweight

Logical. If TRUE (default) the reweighted MMCD estimators are computed.

scale_consistency

Character. Either "quant" (default) or "mmd_med". If "quant", the consistency factor is chosen to achieve consistency under the matrix normal distribution. If "mmd_med", the consistency factor is chosen based on the Mahalanobis distances of the observations.

outlier_quant

numeric parameter between 0 and 1. Chi-square quantile used in the reweighting step.

nthreads

Integer. If 1 (default), all computations are carried out sequentially. If larger then 1, C-steps are carried out in parallel using nthreads threads. If < 0, all possible threads are used.

Details

The MMCD estimators generalize the well-known Minimum Covariance Determinant (MCD) (Rousseeuw 1985; Rousseeuw and Driessen 1999) to the matrix-variate setting. It looks for the h observations, h = \alpha * n, whose covariance matrix has the smallest determinant. The FastMMCD algorithm is used for computation and is described in detail in (double-blind 2024). NOTE: The procedure depends on random initial subsets. Currently setting a seed is only possible if nthreads = 1.

Value

A list containing the following:

mu

Estimated p \times q mean matrix.

cov_row

Estimated p times p rowwise covariance matrix.

cov_col

Estimated q times q columnwise covariance matrix.

cov_row_inv

Inverse of cov_row.

cov_col_inv

Inverse of cov_col.

md

Squared Mahalanobis distances.

md_raw

Squared Mahalanobis distances based on raw MMCD estimators.

det

Value of objective function (determinant of Kronecker product of rowwise and columnwise covariane).

alpha

The (adjusted) value of alpha used to determine the size of the h-subset.

consistency_factors

Consistency factors for raw and reweighted MMCD estimators.

dets

Objective values for the final h-subsets.

best_i

ID of subset with best objective.

h_subset

Final h-subset of raw MMCD estimators.

h_subset_reweighted

Final h-subset of reweighted MMCD estimators.

iterations

Number of C-steps.

dets_init_first

Objective values for the nsamp initial h-subsets after max_iter_cstep_init C-steps.

subsets_first

Subsets created in subsampling procedure for large n.

dets_init_second

Objective values of the 10 best initial subsets after executing C-steps until convergence.

References

Rousseeuw P (1985). “Multivariate Estimation With High Breakdown Point.” Mathematical Statistics and Applications Vol. B, 283-297. doi:10.1007/978-94-009-5438-0_20.

Rousseeuw PJ, Driessen KV (1999). “A Fast Algorithm for the Minimum Covariance Determinant Estimator.” Technometrics, 41(3), 212-223. doi:10.1080/00401706.1999.10485670.

double-blind (2024). “Robust covariance estimation and explainable outlier detection for matrix-valued data.” [Manuscript submitted for publication].

See Also

The mmcd algorithm uses the cstep and mmle functions.

Examples

n = 1000; p = 2; q = 3
mu = matrix(rep(0, p*q), nrow = p, ncol = q)
cov_row = matrix(c(1,0.5,0.5,1), nrow = p, ncol = p)
cov_col = matrix(c(3,2,1,2,3,2,1,2,3), nrow = q, ncol = q)
X <- rmatnorm(n = n, mu, cov_row, cov_col)
ind <- sample(1:n, 0.3*n)
X[,,ind] <- rmatnorm(n = length(ind), matrix(rep(10, p*q), nrow = p, ncol = q), cov_row, cov_col)
par_mmle <- mmle(X)
par_mmcd <- mmcd(X)
distances_mmle <- mmd(X, par_mmle$mu, par_mmle$cov_row, par_mmle$cov_col)
distances_mmcd <- mmd(X, par_mmcd$mu, par_mmcd$cov_row, par_mmcd$cov_col)
plot(distances_mmle, distances_mmcd)
abline(h = qchisq(0.99, p*q), lty = 2, col = "red")
abline(v = qchisq(0.99, p*q), lty = 2, col = "red")

[Package robustmatrix version 0.1.2 Index]