Sphdist {PSinference} | R Documentation |
Spherical Empirical Distribution
Description
This function calculates the empirical distribution of the pivotal random
variable that can be used to perform the Sphericity test of the population covariance matrix
\boldsymbol{\Sigma}
that is \boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p
,
based on the released Single Synthetic data generated under Plug-in Sampling,
assuming that the original dataset is normally distributed.
Usage
Sphdist(nsample, pvariates, iterations)
Arguments
nsample |
Sample size. |
pvariates |
Number of variables. |
iterations |
Number of iterations for simulating values from the
distribution and finding the quantiles. Default is |
Details
We define
T_2^\star = \frac{|\boldsymbol{S}^{\star}|^{\frac{1}{p}}}{tr(\boldsymbol{S}^{\star})/p}
where \boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}
,
v_i
is the i
th observation of the synthetic dataset.
For \boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p
, its distribution is
stochastic equivalent to
\frac{|\boldsymbol{\Omega}_{1}\boldsymbol{\Omega}_{2}|^{\frac{1}{p}}}{tr(\boldsymbol{\Omega}_{1}\boldsymbol{\Omega}_{2})/p}
where \boldsymbol{\Omega}_1
and \boldsymbol{\Omega}_2
are
Wishart random variables,
\boldsymbol{\Omega}_1 \sim \mathcal{W}_p(n-1, \frac{\mathbf{I}_p}{n-1})
is independent of \boldsymbol{\Omega}_2 \sim \mathcal{W}_p(n-1, \mathbf{I}_p)
.
To test \mathcal{H}_0: \boldsymbol{\Sigma} = \sigma^2 \mathbf{I}_p
, compute the observed value of
T_{2}^\star
, \widetilde{T_{2}^\star}
, with the observed values
and reject the null hypothesis if
\widetilde{T_{2}^\star}>t^\star_{2,\alpha}
for \alpha
-significance level, where t^\star_{2,\gamma}
is the \gamma
th percentile of T_2^\star
.
Value
a vector of length iterations
that recorded the empirical distribution's values.
References
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
Examples
# Original data created
library(MASS)
mu <- c(1,2,3,4)
Sigma <- matrix(c(1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1), nrow = 4, ncol = 4, byrow = TRUE)
seed = 1
n_sample = 100
# Create original simulated dataset
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# Sinthetic data created
df_s = simSynthData(df)
# Gather the 0.95 quantile
p = dim(df_s)[2]
T_sph <- Sphdist(nsample = n_sample, pvariates = p, iterations = 10000)
q95 <- quantile(T_sph, 0.95)
# Compute the observed value of T from the synthetic dataset
S_star = cov(df_s*(n_sample-1))
T_obs = (det(S_star)^(1/p))/(sum(diag(S_star))/p)
print(q95)
print(T_obs)
#Since the observed value is bigger than the 95% quantile,
#we don't have statistical evidences to reject the Sphericity property.
#
#Note that the value is very close to one