GVdist {PSinference} | R Documentation |
Generalized Variance Empirical Distribution
Description
This function calculates the empirical distribution of the pivotal random variable that can be used to perform inferential procedures for the Generalized Variance of the released Single Synthetic dataset generated under Plug-in Sampling, assuming that the original distribution is normally distributed.
Usage
GVdist(nsample, pvariates, iterations = 10000)
Arguments
nsample |
Sample size. |
pvariates |
Number of variables. |
iterations |
Number of iterations for simulating values from the distribution and finding the quantiles. Default is |
Details
We define
T_1^\star = (n-1)\frac{|\boldsymbol{S}^*|}{|\boldsymbol{\Sigma}|},
where \boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}
, \boldsymbol{\Sigma}
is the population covariance matrix
and v_i
is the i
th observation of the synthetic dataset.
Its distribution is stochastic equivalent to
\prod_{i=1}^n \chi_{n-i}^2 \prod_{i=1}^p \chi_{n-i}^2
where \chi_{n-i}^2
are all independent chi-square random variables.
The (1-\alpha)
level confidence interval for |\boldsymbol{\Sigma}|
is given by
\left(\frac{(n-1)^p|\tilde{\boldsymbol{S}}^\star|}{t^\star_{1,1-\alpha/2}},
\frac{(n-1)^p|\tilde{\boldsymbol{S}}^\star|}{t^\star_{1,\alpha/2}} \right)
where \tilde{\boldsymbol{S}}^\star
is the observed value of
\boldsymbol{S}^\star
and t^\star_{1,\gamma}
is the \gamma
th percentile of T_1
.
Value
a vector of length iterations
that recorded the empirical distribution's values.
References
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
Examples
# Original data creation
library(MASS)
mu <- c(1,2,3,4)
Sigma <- matrix(c(1, 0.5, 0.5, 0.5,
0.5, 1, 0.5, 0.5,
0.5, 0.5, 1, 0.5,
0.5, 0.5, 0.5, 1), nrow = 4, ncol = 4, byrow = TRUE)
seed = 1
n_sample = 100
# Create original simulated dataset
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# Synthetic data created
df_s = simSynthData(df)
# Gather the 0.025 and 0.975 quantiles and construct confident interval for sigma^2
# Check that sigma^2 is inside in both cases
p = dim(df_s)[2]
T <- GVdist(100, p, 10000)
q975 <- quantile(T, 0.975)
q025 <- quantile(T, 0.025)
left <- (n_sample-1)^p * det(cov(df_s)*(n_sample-1))/q975
right <- (n_sample-1)^p * det(cov(df_s)*(n_sample-1))/q025
cat(left,right,'\n')
print(det(Sigma))
# The synthetic value is inside the confidence interval of GV