spadimo {crmReg}R Documentation

SPArse DIrections of Maximal Outlyingness

Description

Computes the sparse directions of maximal outlyings of a given observation and shows diagnostic plots for analyzing that observation.

Usage

spadimo(data, weights, obs,
        control = list(scaleFun = Qn, nlatent = 1, etas = NULL, csqcritv  = 0.975,
                       stopearly = FALSE, trace = FALSE, plot = TRUE))

Arguments

data

the data as a data frame.

weights

a numeric vector containing the case weights from a robust estimator.

obs

the (integer) case number under consideration.

control

a list of options that control details of the crm algorithm. The following options are available:

  • scaleFun
    function used for robust scaling the variables (e.g. Qn, mad, etc.).

  • nlatent
    integer number of latent variables for sparse PLS regression (via SNIPLS) (default is 1).

  • etas
    vector of decreasing sparsity parameters (default is NULL in which case etas = seq(0.9, 0.1, -0.05) if n > p, otherwise etas = seq(0.6, 0.1, -0.05)).

  • csqcritv
    probability level for internal chi-squared quantile (used when n > p) (default is 0.975).

  • stopearly
    if TRUE, method stops as soon as the reduced case is no longer outlying, else if FALSE (default) it loops through all values of eta.

  • trace
    should intermediate results be printed (default is FALSE).

  • plot
    should heatmaps and graph of the results be shown (default is TRUE).

Details

Given an observation that has been detected as an outlier, SPADIMO (Debruyne et al., 2019) finds the subset of variables contributing most the outlier’s outlyingness. Here, the outlyingness of a data point is defined as its robust Mahalanobis distance. The relevant variables are found by checking the direction in which the observation is most outlying. SPADIMO estimates this direction of maximal outlyingness in a sparse manner. Thereby, the method helps to understand in which way an outlier lies out.

Value

spadimo returns a list containing the following elements:

outlvars

vector containing individual variable names contributing most to obs's outlyingness.

outlvarslist

list of variables contributing to obs's outlyingness for different values of eta.

a

vector, the sparse direction of maximal outlyingness.

alist

list of sparse directions of maximal outlyingness for different values of eta.

o.before

outlyingness of original case (n < p) or PCA outlier flag (n >= p) before removing outlying variables.

o.after

outlyingness of reduced case (n > p) or PCA outlier flag (n >= p) after removing outlying variables.

eta

cutoff where obs is no longer outlying.

time

time to execute the SPADIMO algorithm.

control

a list with control parameters that are used.

Author(s)

Michiel Debruyne, Sebastiaan Hoppner, Sven Serneels, and Tim Verdonck

References

Debruyne, M., Hoppner, S., Serneels, S., and Verdonck, T. (2019). Outlyingness: Which variables contribute most? Statistics and Computing, 29 (4), 707–723. DOI:10.1007/s11222-018-9831-5

See Also

crm, predict.crm, cellwiseheatmap, daprpr

Examples

library(crmReg)
data(topgear)

# get case weights from a robust estimator (covMCD function in robustbase package):
MCD <- robustbase::covMcd(topgear, alpha = 0.5)

# SPADIMO with diagnostic plots:
# Example 1:
Peugeot <- spadimo(data = topgear,
                   weights = MCD$mcd.wt,
                   obs = which(rownames(topgear) == "Peugeot 107"))
# check the plots!
# individual variable names contributing most to Peugeot 107's outlyingness:
print(Peugeot$outlvars)
# sparse direction of maximal outlyingness with eta = Peugeot$eta:
print(Peugeot$a)
# default SPADIMO control parameters:
print(Peugeot$control)

# Example 2:
Bugatti <- spadimo(data = topgear,
                   weights = MCD$mcd.wt,
                   obs = which(rownames(topgear) == "Bugatti Veyron"),
                   control = list(stopearly = TRUE, trace = TRUE, plot = TRUE))
# check the plots!
# individual variable names contributing most to Bugatti Veyron's outlyingness:
print(Bugatti$outlvars)
# sparse direction of maximal outlyingness with eta = Bugatti$eta:
print(Bugatti$a)

[Package crmReg version 1.0.2 Index]