white_data {SpatialBSS} | R Documentation |
Different Approaches of Data Whitening
Description
white_data
whites the data with respect to the sample covariance matrix, or different spatial scatter matrices.
Usage
white_data(x, whitening = c("standard", "rob", "hr"),
lcov = c('lcov', 'ldiff', 'lcov_norm'),
kernel_mat = numeric(0))
Arguments
x |
a numeric matrix of dimension |
whitening |
a string indicating the whitening method. If |
lcov |
a string indicating which type of local covariance matrix is used for whitening, when the whitening method |
kernel_mat |
a spatial kernel matrix with dimension |
Details
The inverse square root of a positive definite matrix M(x)
with eigenvalue decomposition UDU'
is defined as M(x)^{-1/2} = UD^{-1/2}U'
. white_data
whitens the data by M(x)^{-1/2}(x - T(x))
where T(x)
is a location functional of x
and the matrix M(x)
is a scatter functional. If the argument whitening
is 'standard'
, M(x)
is the sample covariance matrix and T(x)
is a vector of column means of x
. If the argument whitening
is 'hr'
, the Hettmansperger-Randles location and scatter estimates (Hettmansperger & Randles, 2002) are used as location functional T(x)
and scatter functional M(x)
. The Hettmansperger-Randles location and scatter estimates are robust variants of sample mean and covariance matrices, that are used for whitening in robsbss
. If the argument whitening
is 'rob'
, the argument lcov
determines the scatter functional M(x)
to be one of the following local scatter matrices:
-
'lcov'
:LCov(f) = 1/n \sum_{i,j} f(d_{i,j}) (x(s_i)-\bar{x}) (x(s_j)-\bar{x})' ,
-
'ldiff'
:LDiff(f) = 1/n \sum_{i,j} f(d_{i,j}) (x(s_i)-x(s_j)) (x(s_i)-x(s_j))',
-
'lcov_norm'
:LCov^*(f) = 1/(n F^{1/2}_{f,n}) \sum_{i,j} f(d_{i,j}) (x(s_i)-\bar{x}) (x(s_j)-\bar{x})',
with
F_{f,n} = 1 / n \sum_{i,j} f^2(d_{i,j}),
where d_{i,j} \ge 0
correspond to the pairwise distances between coordinates, x(s_i)
are the p
random field values at location s_i
, \bar{x}
is the sample mean vector, and the kernel function f(d)
determines the locality. The choice 'lcov_norm'
is useful when testing for the actual signal dimension of the latent field, see sbss_asymp
and sbss_boot
. See also sbss
for details.
Note that LCov(f)
are usually not positive definite, therefore in that case the matrix cannot be inverted and an error is produced. Whitening with LCov(f)
matrices might be favorable in the presence of spatially uncorrelated noise, and whitening with LDiff(f)
might be favorable when a non-constant smooth drift is present in the data.
The argument kernel_mat
is a matrix of dimension c(n,n)
where each entry corresponds to the spatial kernel function evaluated at the distance between two sample locations, mathematically the entry ij of each kernel matrix is f(d_{i,j})
. This matrix is usually computed with the function spatial_kernel_matrix
.
Value
white_data
returns a list with the following entries:
mu |
a numeric vector of length |
x_0 |
a numeric matrix of dimension |
x_w |
a numeric matrix of dimension |
s |
a numeric matrix of dimension |
s_inv_sqrt |
a numeric matrix of dimension |
s_sqrt |
a numeric matrix of dimension |
References
Muehlmann, C., Filzmoser, P. and Nordhausen, K. (2021), Spatial Blind Source Separation in the Presence of a Drift, Submitted for publication. Preprint available at https://arxiv.org/abs/2108.13813.
Bachoc, F., Genton, M. G, Nordhausen, K., Ruiz-Gazen, A. and Virta, J. (2020), Spatial Blind Source Separation, Biometrika, 107, 627-646, doi:10.1093/biomet/asz079.
Hettmansperger, T. P., & Randles, R. H. (2002). A practical affine equivariant multivariate median. Biometrika, 89 , 851-860. doi:10.1093/biomet/89.4.851.
See Also
Examples
# simulate coordinates
coords <- runif(1000 * 2) * 20
dim(coords) <- c(1000, 2)
coords_df <- as.data.frame(coords)
names(coords_df) <- c("x", "y")
# simulate random field
if (!requireNamespace('gstat', quietly = TRUE)) {
message('Please install the package gstat to run the example code.')
} else {
library(gstat)
model_1 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0,
model = vgm(psill = 0.025, range = 1, model = 'Exp'), nmax = 20)
model_2 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0,
model = vgm(psill = 0.025, range = 1, kappa = 2, model = 'Mat'),
nmax = 20)
model_3 <- gstat(formula = z ~ 1, locations = ~ x + y, dummy = TRUE, beta = 0,
model = vgm(psill = 0.025, range = 1, model = 'Gau'), nmax = 20)
field_1 <- predict(model_1, newdata = coords_df, nsim = 1)$sim1
field_2 <- predict(model_2, newdata = coords_df, nsim = 1)$sim1
field_3 <- predict(model_3, newdata = coords_df, nsim = 1)$sim1
field <- cbind(field_1, field_2, field_3)
X <- as.matrix(field)
# white the data with the usual sample covariance
x_w_1 <- white_data(X)
# white the data with a ldiff matrix and ring kernel
kernel_params_ring <- c(0, 1)
ring_kernel_list <-
spatial_kernel_matrix(coords, 'ring', kernel_params_ring)
x_w_2 <- white_data(field, whitening = 'rob',
lcov = 'ldiff', kernel_mat = ring_kernel_list[[1]])
# Generate 5 % of global outliers to data
field_cont <- gen_glob_outl(field)[,1:3]
X <- as.matrix(field_cont)
# white the data using Hettmansperger-Randles location and scatter estimates
x_w_3 <- white_data(X, whitening = 'hr')
}