mrct {mrct}R Documentation

Minimum regularized covariance trace estimator

Description

Functional outlier detection based on the minimum regularized covariance trace estimator (Oguamalam et al. 2023) as a robust covariance estimator. This estimator uses a generalization of the Mahalanobis distance for the functional setting (Berrendero et al. 2020) and a corresponding theoretical cutoff value.

Usage

mrct(
  data,
  h = 0.75,
  alpha = 0.01,
  initializations = 5,
  subset.iteration = 10,
  seed = 123,
  scaling.iterations = 10,
  scaling.tolerance = 10^(-4),
  criterion = "sum",
  sum.percentage = 0.75
)

Arguments

data

Numeric matrix of a functional data set for which the esimator has to be calculated. Each row contains an observation. They are assumed to be observed on the same regular grid.

h

Numeric value between 0.5 and 1. Ratio of the data which the estimator is based on. Default is set to 0.75, i.e. 75\% of the data will be used for the estimator.

alpha

Numeric (default is 0.01). Tikhonov regularization parameter \alpha.

initializations

Integer (default is 5). Number of random initial subsets.

subset.iteration

Integer (default is 10). Maximum number of how often each subset is re-estimated and adjusted.

seed

Integer (default is 123). Random seed for reproducibility.

scaling.iterations

Integer (default is 5). The maximum number of times k_1 is re-scaled if the error between subsequent scalingparameters does not fall below scaling.tolerance.

scaling.tolerance

Numeric (default is 10^{-4}). The error tolerance for re-scaling. If the error falls below this value, the re-scaling procedure stops.

criterion

Character. Criterion based on which the optimal subset is chosen among the final subsets. Possible options are: "cluster" and the default "sum".

sum.percentage

Numeric value between 0.5 and 1. Corresponding to the "sum" criterion. Determines the fraction of observations up to which the sum over the sorted functional Mahalanobis distances is calculated (in ascending order). Default is set to 0.75, i.e. the sum of the smallest 75\% of Mahalanobis distances is calculated. If outliers are present, this value should not be to high, in order not to include any outlying curves.

Value

A list:

theoretical

Integer vector of the indices corresponding to the outliers based on the MRCT estimator.

theoretical.w

Same as theoretical with an additional re-weighting step.

aMHD

Numeric vector containing the functional Mahalanobis distances of all observations based on the MRCT estimator.

aMHD.w

Same as aMHD with an additional re-weighting step.

quant

Numeric. Theoretical cutoff value for outlier detection.

quant.w

Same as quant with an additional re-weighting step.

k

Numeric. Scalingparameter k_1 of Algorithm 1 described in (Oguamalam et al. 2023).

k.w

Same as k with an additional re-weighting step.

optimal.subset

Integer vector of the optimal h-subset.

subsets

Numeric matrix containing all final subsets. Each row of subsets is one final subset.

objval

Numeric vector with the objective values of the final subsets based on criterion.

References

Berrendero JR, Bueno-Larraz B, Cuevas A (2020). “On Mahalanobis Distance in Functional Settings.” J. Mach. Learn. Res., 21(9), 1–33..

Oguamalam J, Radojičić U, Filzmoser P (2023). “Minimum regularized covariance trace estimator and outlier detection for functional data.” https://doi.org/10.48550/arXiv.2307.13509..

Examples

# Fix seed for reproducibility
set.seed(123)

# Sample outlying indices
cont.ind <- sample(1:50, size=10)

# Generate 50 curves on the interval [0,1] at 50 timepoints with 20% outliers
y <- mrct.rgauss(x.grid=seq(0,1,length.out=50), N=50, model=1,
                 outliers=cont.ind, method="linear")

# Visualize curves (regular curves grey, outliers black)
colormap <- rep("grey",50); colormap[cont.ind] <- "black"
matplot(x=seq(0,1,length.out=50), y=t(y), type="l", lty="solid",
        col=colormap, xlab="t",ylab="")

# Run MRCT
mrct.y <- mrct(data=y, h=0.75, alpha=0.1,
               initializations=10, criterion="sum")

# Visualize alpha-Mahalanobis distance with cutoff (horizontal black line)
# Colors correspond to simulated outliers, shapes to estimated (MRCT) ones
# (circle regular and triangle irregular curves)
shapemap <- rep(1,50); shapemap[mrct.y$theoretical.w] <- 2
plot(x=1:50, y=mrct.y$aMHD.w, col=colormap, pch=shapemap,
     xlab="Index", ylab=expression(alpha*"-MHD"))
abline(h = mrct.y$quant.w)

# If you dont have any information on possible outliers,
# alternatively you could use the S3 method plot.mrct()
mrct.plot(mrct.y)

[Package mrct version 0.0.1.0 Index]