white_data {SpatialBSS} | R Documentation |
Different Approaches of Data Whitening
Description
white_data
whites the data with respect to the sample covariance matrix, or different spatial scatter matrices.
Usage
white_data(x, whitening = c("standard", "rob", "hr"),
lcov = c('lcov', 'ldiff', 'lcov_norm'),
kernel_mat = numeric(0))
Arguments
x |
a numeric matrix of dimension |
whitening |
a string indicating the whitening method. If |
lcov |
a string indicating which type of local covariance matrix is used for whitening, when the whitening method |
kernel_mat |
a spatial kernel matrix with dimension |
Details
The inverse square root of a positive definite matrix with eigenvalue decomposition
is defined as
.
white_data
whitens the data by where
is a location functional of
and the matrix
is a scatter functional. If the argument
whitening
is 'standard'
, is the sample covariance matrix and
is a vector of column means of
. If the argument
whitening
is 'hr'
, the Hettmansperger-Randles location and scatter estimates (Hettmansperger & Randles, 2002) are used as location functional and scatter functional
. The Hettmansperger-Randles location and scatter estimates are robust variants of sample mean and covariance matrices, that are used for whitening in
robsbss
. If the argument whitening
is 'rob'
, the argument lcov
determines the scatter functional to be one of the following local scatter matrices:
-
'lcov'
: -
'ldiff'
: -
'lcov_norm'
:with
where correspond to the pairwise distances between coordinates,
are the
p
random field values at location ,
is the sample mean vector, and the kernel function
determines the locality. The choice
'lcov_norm'
is useful when testing for the actual signal dimension of the latent field, see sbss_asymp
and sbss_boot
. See also sbss
for details.
Note that are usually not positive definite, therefore in that case the matrix cannot be inverted and an error is produced. Whitening with
matrices might be favorable in the presence of spatially uncorrelated noise, and whitening with
might be favorable when a non-constant smooth drift is present in the data.
The argument kernel_mat
is a matrix of dimension c(n,n)
where each entry corresponds to the spatial kernel function evaluated at the distance between two sample locations, mathematically the entry ij of each kernel matrix is . This matrix is usually computed with the function
spatial_kernel_matrix
.
Value
white_data
returns a list with the following entries:
mu |
a numeric vector of length |
x_0 |
a numeric matrix of dimension |
x_w |
a numeric matrix of dimension |
s |
a numeric matrix of dimension |
s_inv_sqrt |
a numeric matrix of dimension |
s_sqrt |
a numeric matrix of dimension |
References
Muehlmann, C., Filzmoser, P. and Nordhausen, K. (2021), Spatial Blind Source Separation in the Presence of a Drift, Submitted for publication. Preprint available at https://arxiv.org/abs/2108.13813.
Bachoc, F., Genton, M. G, Nordhausen, K., Ruiz-Gazen, A. and Virta, J. (2020), Spatial Blind Source Separation, Biometrika, 107, 627-646, doi:10.1093/biomet/asz079.
Hettmansperger, T. P., & Randles, R. H. (2002). A practical affine equivariant multivariate median. Biometrika, 89 , 851-860. doi:10.1093/biomet/89.4.851.
See Also
Examples
# simulate coordinates
coords <- runif(1000 * 2) * 20
dim(coords) <- c(1000, 2)
coords_df <- as.data.frame(coords)
names(coords_df) <- c("x", "y")
# simulate random field
if (!requireNamespace('gstat', quietly = TRUE)) {
message('Please install the package gstat to run the example code.')
} else {
library(gstat)
model_1 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0,
model = vgm(psill = 0.025, range = 1, model = 'Exp'), nmax = 20)
model_2 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0,
model = vgm(psill = 0.025, range = 1, kappa = 2, model = 'Mat'),
nmax = 20)
model_3 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0,
model = vgm(psill = 0.025, range = 1, model = 'Gau'), nmax = 20)
field_1 <- predict(model_1, newdata = coords_df, nsim = 1)$sim1
field_2 <- predict(model_2, newdata = coords_df, nsim = 1)$sim1
field_3 <- predict(model_3, newdata = coords_df, nsim = 1)$sim1
field <- cbind(field_1, field_2, field_3)
X <- as.matrix(field)
# white the data with the usual sample covariance
x_w_1 <- white_data(X)
# white the data with a ldiff matrix and ring kernel
kernel_params_ring <- c(0, 1)
ring_kernel_list <-
spatial_kernel_matrix(coords, 'ring', kernel_params_ring)
x_w_2 <- white_data(field, whitening = 'rob',
lcov = 'ldiff', kernel_mat = ring_kernel_list[[1]])
# Generate 5 % of global outliers to data
field_cont <- gen_glob_outl(field)[,1:3]
X <- as.matrix(field_cont)
# white the data using Hettmansperger-Randles location and scatter estimates
x_w_3 <- white_data(X, whitening = 'hr')
}