Sphdist {PSinference} | R Documentation |
Spherical Empirical Distribution
Description
This function calculates the empirical distribution of the pivotal random
variable that can be used to perform the Sphericity test of the population covariance matrix
that is
,
based on the released Single Synthetic data generated under Plug-in Sampling,
assuming that the original dataset is normally distributed.
Usage
Sphdist(nsample, pvariates, iterations)
Arguments
nsample |
Sample size. |
pvariates |
Number of variables. |
iterations |
Number of iterations for simulating values from the
distribution and finding the quantiles. Default is |
Details
We define
where ,
is the
th observation of the synthetic dataset.
For
, its distribution is
stochastic equivalent to
where and
are
Wishart random variables,
is independent of
.
To test
, compute the observed value of
,
, with the observed values
and reject the null hypothesis if
for
-significance level, where
is the
th percentile of
.
Value
a vector of length iterations
that recorded the empirical distribution's values.
References
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
Examples
# Original data created
library(MASS)
mu <- c(1,2,3,4)
Sigma <- matrix(c(1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1), nrow = 4, ncol = 4, byrow = TRUE)
seed = 1
n_sample = 100
# Create original simulated dataset
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# Sinthetic data created
df_s = simSynthData(df)
# Gather the 0.95 quantile
p = dim(df_s)[2]
T_sph <- Sphdist(nsample = n_sample, pvariates = p, iterations = 10000)
q95 <- quantile(T_sph, 0.95)
# Compute the observed value of T from the synthetic dataset
S_star = cov(df_s*(n_sample-1))
T_obs = (det(S_star)^(1/p))/(sum(diag(S_star))/p)
print(q95)
print(T_obs)
#Since the observed value is bigger than the 95% quantile,
#we don't have statistical evidences to reject the Sphericity property.
#
#Note that the value is very close to one