mmcd {robustmatrix} | R Documentation |
The Matrix Minimum Covariance Determinant (MMCD) Estimator
Description
mmcd
computes the robust MMCD estimators of location and covariance for matrix-variate data
using the FastMMCD algorithm (double-blind 2024).
Usage
mmcd(
X,
nsamp = 500L,
alpha = 0.5,
lambda = 0,
max_iter_cstep = 100L,
max_iter_MLE = 100L,
max_iter_cstep_init = 2L,
max_iter_MLE_init = 2L,
adapt_alpha = TRUE,
reweight = TRUE,
scale_consistency = "quant",
outlier_quant = 0.975,
nthreads = 1L
)
Arguments
X |
a 3d array of dimension |
nsamp |
number of initial h-subsets (default is 500). |
alpha |
numeric parameter between 0.5 (default) and 1. Controls the size |
lambda |
a smooting parameter for the rowwise and columnwise covariance matrices. |
max_iter_cstep |
upper limit of C-step iterations (default is 100) |
max_iter_MLE |
upper limit of MLE iterations (default is 100) |
max_iter_cstep_init |
upper limit of C-step iterations for initial h-subsets (default is 2) |
max_iter_MLE_init |
upper limit of MLE iterations for initial h-subsets (default is 2) |
adapt_alpha |
Logical. If TRUE (default) alpha is adapted to take the dimension of the data into account. |
reweight |
Logical. If TRUE (default) the reweighted MMCD estimators are computed. |
scale_consistency |
Character. Either "quant" (default) or "mmd_med". If "quant", the consistency factor is chosen to achieve consistency under the matrix normal distribution. If "mmd_med", the consistency factor is chosen based on the Mahalanobis distances of the observations. |
outlier_quant |
numeric parameter between 0 and 1. Chi-square quantile used in the reweighting step. |
nthreads |
Integer. If 1 (default), all computations are carried out sequentially.
If larger then 1, C-steps are carried out in parallel using |
Details
The MMCD estimators generalize the well-known Minimum Covariance Determinant (MCD)
(Rousseeuw 1985; Rousseeuw and Driessen 1999) to the matrix-variate setting.
It looks for the h
observations, h = \alpha * n
, whose covariance matrix has the smallest determinant.
The FastMMCD algorithm is used for computation and is described in detail in (double-blind 2024).
NOTE: The procedure depends on random initial subsets. Currently setting a seed is only possible if nthreads = 1
.
Value
A list containing the following:
mu |
Estimated |
cov_row |
Estimated |
cov_col |
Estimated |
cov_row_inv |
Inverse of |
cov_col_inv |
Inverse of |
md |
Squared Mahalanobis distances. |
md_raw |
Squared Mahalanobis distances based on raw MMCD estimators. |
det |
Value of objective function (determinant of Kronecker product of rowwise and columnwise covariane). |
alpha |
The (adjusted) value of alpha used to determine the size of the h-subset. |
consistency_factors |
Consistency factors for raw and reweighted MMCD estimators. |
dets |
Objective values for the final h-subsets. |
best_i |
ID of subset with best objective. |
h_subset |
Final h-subset of raw MMCD estimators. |
h_subset_reweighted |
Final h-subset of reweighted MMCD estimators. |
iterations |
Number of C-steps. |
dets_init_first |
Objective values for the |
subsets_first |
Subsets created in subsampling procedure for large |
dets_init_second |
Objective values of the 10 best initial subsets after executing C-steps until convergence. |
References
Rousseeuw P (1985).
“Multivariate Estimation With High Breakdown Point.”
Mathematical Statistics and Applications Vol. B, 283-297.
doi:10.1007/978-94-009-5438-0_20.
Rousseeuw PJ, Driessen KV (1999).
“A Fast Algorithm for the Minimum Covariance Determinant Estimator.”
Technometrics, 41(3), 212-223.
doi:10.1080/00401706.1999.10485670.
double-blind (2024).
“Robust covariance estimation and explainable outlier detection for matrix-valued data.”
[Manuscript submitted for publication].
See Also
The mmcd
algorithm uses the cstep
and mmle
functions.
Examples
n = 1000; p = 2; q = 3
mu = matrix(rep(0, p*q), nrow = p, ncol = q)
cov_row = matrix(c(1,0.5,0.5,1), nrow = p, ncol = p)
cov_col = matrix(c(3,2,1,2,3,2,1,2,3), nrow = q, ncol = q)
X <- rmatnorm(n = n, mu, cov_row, cov_col)
ind <- sample(1:n, 0.3*n)
X[,,ind] <- rmatnorm(n = length(ind), matrix(rep(10, p*q), nrow = p, ncol = q), cov_row, cov_col)
par_mmle <- mmle(X)
par_mmcd <- mmcd(X)
distances_mmle <- mmd(X, par_mmle$mu, par_mmle$cov_row, par_mmle$cov_col)
distances_mmcd <- mmd(X, par_mmcd$mu, par_mmcd$cov_row, par_mmcd$cov_col)
plot(distances_mmle, distances_mmcd)
abline(h = qchisq(0.99, p*q), lty = 2, col = "red")
abline(v = qchisq(0.99, p*q), lty = 2, col = "red")