gen_glob_outl {SpatialBSS}R Documentation

Contamination with Global Outliers


Generates synthetic global outliers and contaminates a given p-variate random field


gen_glob_outl(x, alpha = 0.05, h = 10, random_sign = FALSE)



a numeric matrix of dimension c(n, p) where the p columns correspond to the entries of the random field and the n rows are the observations.


a numerical value between 0 and 1 giving the proportion of observations to contaminate.


a numerical constant to determine how large the contaminated outliers are, see details.


logical. If TRUE, the sign of each component of the outlier is randomly selected. Default is FALSE. See more in details.


gen_glob_outl generates outliers for a given field by selecting randomly round(alpha * n) observations xix_i to be the outliers and contaminating them by setting xiout=(ci)xix^{out}_i = (c^i)'x_i, where the elements cjic^i_j of vector cic^i are determined by the parameter random_sign. If random_sign = TRUE, cjic^i_j is either hh or h-h with P(cji=h)=P(cji=h)=0.5P(c^i_j = h) = P(c^i_j = -h) = 0.5. If random_sign = FALSE, cji=hc^i_j=h for all j=1,...pj=1,...p, i=1,...,ni=1,...,n. The parameter alpha determines the contamination rate α\alpha and the parameter h determines the size of the outliers.


gen_glob_outl returns a data.frame containing the contaminated fields as pp first columns. The column p+1p + 1 contains a logical indicator whether the observation is outlier or not.

See Also



# simulate coordinates
coords <- runif(1000 * 2) * 20
dim(coords) <- c(1000, 2)
coords_df <-
names(coords_df) <- c("x", "y")
# simulate random field
if (!requireNamespace('gstat', quietly = TRUE)) {
  message('Please install the package gstat to run the example code.')
} else {
  model_1 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, model = 'Exp'), nmax = 20)
  model_2 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, kappa = 2, model = 'Mat'), 
                   nmax = 20)
  model_3 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0, 
                   model = vgm(psill = 0.025, range = 1, model = 'Gau'), nmax = 20)
  field_1 <- predict(model_1, newdata = coords_df, nsim = 1)$sim1
  field_2 <- predict(model_2, newdata = coords_df, nsim = 1)$sim1
  field_3 <- predict(model_3, newdata = coords_df, nsim = 1)$sim1
  field <- cbind(field_1, field_2, field_3)
  # Generate 10 % global outliers to data, with size h=15.
  field_cont <- gen_glob_outl(field, alpha = 0.1, h = 15)
  # Generate 5 % global outliers to data, with size h = 10 and random sign.
  field_cont2 <- gen_glob_outl(field, alpha = 0.05, h = 10, random_sign = TRUE)

[Package SpatialBSS version 0.14-0 Index]