comp_simu_test {ICSOutlier}R Documentation

Selection of Nonnormal Invariant Components Using Simulations

Description

Identifies invariant coordinates that are nonnormal using simulations under a standard multivariate normal model for a specific data setup and scatter combination.

Usage

comp_simu_test(
  object,
  S1 = NULL,
  S2 = NULL,
  S1_args = list(),
  S2_args = list(),
  m = 10000,
  type = "smallprop",
  level = 0.05,
  adjust = TRUE,
  n_cores = NULL,
  iseed = NULL,
  pkg = "ICSOutlier",
  q_type = 7,
  ...
)

Arguments

object

object of class "ICS" where both S1 and S2 are specified as functions. The sample size and the dimension of interest are also obtained from the object. It is also natural to expect that the invariant coordinate are centered.

S1

an object of class "ICS_scatter" or a function that contains the location vector and scatter matrix as location and scatter components.

S2

an object of class "ICS_scatter" or a function that contains the location vector and scatter matrix as location and scatter components.

S1_args

a list containing additional arguments for S1.

S2_args

a list containing additional arguments for S2.

m

number of simulations. Note that since extreme quantiles are of interest m should be large.

type

currently the only type option is "smallprop". See details.

level

the initial level used to make a decision. The cut-off values are the (1-level)th quantile of the eigenvalues obtained from simulations. See details.

adjust

logical. If TRUE, the quantiles levels are adjusted. Default is TRUE. See details.

n_cores

number of cores to be used. If NULL or 1, no parallel computing is used. Otherwise makeCluster with type = "PSOCK" is used.

iseed

If parallel computation is used the seed passed on to clusterSetRNGStream. Default is NULL which means no fixed seed is used.

pkg

When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via require. Must be at least "ICSOutlier" and must contain the packages needed to compute the scatter matrices.

q_type

specifies the quantile algorithm used in quantile.

...

further arguments passed on to the function quantile.

Details

Based on simulations it detects which of the components follow a univariately normal distribution. More precisely it identifies the observed eigenvalues larger than the ones coming from normal distributed data. m standard normal data sets are simulated using the same data size and scatters as specified in the "ICS" object. The cut-off values are determined based on a quantile of these simulated eigenvalues.

As the eigenvalues, aka generalized kurtosis values, of ICS are ordered it is natural to perform the comparison in a specific order depending on the purpose. Currently the only available type is "smallprop" so starting with the first component, the observed eigenvalues are successively compared to these cut-off values. The precedure stops when an eigenvalue is below the corresponding cut-off, so when a normal component is detected.

If adjust = FALSE all eigenvalues are compared to the same (1-level)th level of the quantile. This leads however often to too many selected components. Therefore some multiple testing adjustment might be useful. The current default adjusts the quantile for the jth component as 1-level/j.

Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations. Note also that the function is seldomly called directly by the user but internally by ICS_outlier().

Value

A list containing:

Author(s)

Aurore Archimbaud and Klaus Nordhausen

References

Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.

See Also

ICS(), comp_norm_test()

Examples

# For a real analysis use larger values for m and more cores if available
set.seed(123)
Z <- rmvnorm(1000, rep(0, 6))
# Add 20 outliers on the first component
Z[1:20, 1] <- Z[1:20, 1] + 10
pairs(Z)
icsZ <- ICS(Z)
# For demo purpose only small m value, should select the first component
comp_simu_test(icsZ, S1 = ICS_cov, S2= ICS_cov4, m = 400, n_cores = 1)

## Not run: 
 # For using two cores
  # For demo purpose only small m value, should select the first component
  comp_simu_test(icsZ, S1 = ICS_cov, S2 = ICS_cov4, m = 500, n_cores = 2, iseed = 123)
  # For using several cores and for using a scatter function from a different package
  # Using the parallel package to detect automatically the number of cores
  library(parallel)
  # ICS with MCD estimates and the usual estimates
  library(ICSClust)
        icsZmcd <- ICS(Z, S1 = ICS_mcd_raw, S2 = ICS_cov, S1_args = list(alpha = 0.75))
        # For demo purpose only small m value, should select the first component
        comp_simu_test(icsZmcd, S1 = ICS_mcd_raw, S2 = ICS_cov, 
        S1_args = list(alpha = 0.75, location = TRUE),
         m = 500, ncores = detectCores()-1, 
                    pkg = c("ICSOutlier", "ICSClust"), iseed = 123)
 
## End(Not run)
 # Example with no outlier
 Z0 <- rmvnorm(1000, rep(0, 6))
 pairs(Z0)
 icsZ0 <- ICS(Z0)
 # Should select no component
 comp_simu_test(icsZ0,S1 = ICS_cov, S2 = ICS_cov4, m = 400, level = 0.01, n_cores = 1)

[Package ICSOutlier version 0.4-0 Index]