canodist {PSinference}R Documentation

Canonical Empirical Distribution

Description

This function calculates the empirical distribution of the pivotal random variable that can be used to perform inferential procedures for the regression of one subset of variables on the other based on the released Single Synthetic data generated under Plug-in Sampling, assuming that the original dataset is normally distributed.

Usage

canodist(part, nsample, pvariates, iterations)

Arguments

part

Number of partitions.

nsample

Sample size.

pvariates

Number of variables.

iterations

Number of iterations for simulating values from the distribution and finding the quantiles. Default is 10000.

Details

We define

T4Δ=(S12(S22)1Δ)S22(S12)(S22)1Δ)S11.2T_4^\star|\boldsymbol{\Delta} = \frac{(|\boldsymbol{S}^{\star}_{12} (\boldsymbol{S}^{\star}_{22})^{-1}-\boldsymbol{\Delta}) \boldsymbol{S}^{\star}_{22}(\boldsymbol{S}^{\star}_{12}) (\boldsymbol{S}^{\star}_{22})^{-1}-\boldsymbol{\Delta})^\top|} {|\boldsymbol{S}^{\star}_{11.2}|}

where S=i=1n(vivˉ)(vivˉ)\boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}, viv_i is the iith observation of the synthetic dataset, considering S\boldsymbol{S}^\star partitioned as

S=[S11S12S21S22].\boldsymbol{S}^{\star}=\left[\begin{array}{lll} \boldsymbol{S}^{\star}_{11}& \boldsymbol{S}^{\star}_{12}\\ \boldsymbol{S}^{\star}_{21} & \boldsymbol{S}^{\star}_{22} \end{array}\right].

For Δ=Σ12Σ221\Delta = \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}, where Σ\boldsymbol{\Sigma} is partitioned the same way as S\boldsymbol{S}^{\star} its distribution is stochastic equivalent to

Ω12Ω221Ω21Ω11Ω12Ω221Ω21\frac{|\boldsymbol{\Omega}_{12}\boldsymbol{\Omega}_{22}^{-1} \boldsymbol{\Omega}_{21}|}{|\boldsymbol{\Omega}_{11}-\boldsymbol{\Omega}_{12} \boldsymbol{\Omega}_{22}^{-1}\boldsymbol{\Omega}_{21}|}

where ΩWp(n1,Wn1)\boldsymbol{\Omega} \sim \mathcal{W}_p(n-1, \frac{\boldsymbol{W}}{n-1}), WWp(n1,Ip)\boldsymbol{W} \sim \mathcal{W}_p(n-1, \mathbf{I}_p) and Ω\boldsymbol{\Omega} partitioned in the same way as S\boldsymbol{S}^{\star}. To test H0:Δ=Δ0\mathcal{H}_0: \boldsymbol{\Delta} =\boldsymbol{\Delta}_0, compute the value of T4T_{4}^\star, T4~\widetilde{T_{4}^\star}, with the observed values and reject the null hypothesis if T4~>t4,1α\widetilde{T_{4}^\star}>t^\star_{4,1-\alpha} for α\alpha-significance level, where t4,γt^\star_{4,\gamma} is the γ\gammath percentile of T4T_4^\star.

Value

a vector of length iterations that recorded the empirical distribution's values.

References

Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.

Examples

# generate original data
library(MASS)
n_sample = 100
p = 4
mu <- c(1,2,3,4)
Sigma = matrix(c(1,   0.5, 0.1, 0.7,
                 0.5,   2, 0.4, 0.9,
                 0.1, 0.4,   3, 0.2,
                 0.7, 0.9, 0.2,   4), nr = 4, nc = 4, byrow = TRUE)

df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# generate synthetic data
df_s = simSynthData(df)
#Decompose Sigma and Sstar
part = 2
Sigma_12 = partition(Sigma,nrows = part, ncol = part)[[2]]
Sigma_22 = partition(Sigma,nrows = part, ncol = part)[[4]]
Delta0 = Sigma_12 %*% solve(Sigma_22)

Sstar = cov(df_s)
Sstar_11 = partition(Sstar,nrows = part, ncol = part)[[1]]
Sstar_12 = partition(Sstar,nrows = part, ncol = part)[[2]]
Sstar_21 = partition(Sstar,nrows = part, ncol = part)[[3]]
Sstar_22 = partition(Sstar,nrows = part, ncol = part)[[4]]


DeltaEst = Sstar_12 %*% solve(Sstar_22)
Sstar11_2 = Sstar_11 - Sstar_12 %*% solve(Sstar_22) %*% Sstar_21


T4_obs = det((DeltaEst-Delta0)%*%Sstar_22%*%t(DeltaEst-Delta0))/det(Sstar11_2)

T4 <- canodist(part = part, nsample = n_sample, pvariates = p, iterations = 10000)
q95 <- quantile(T4, 0.95)

T4_obs > q95 #False means that we don't have statistical evidences to reject Delta0
print(T4_obs)
print(q95)
# When the observed value is smaller than the 95% quantile,
# we don't have statistical evidences to reject the Sphericity property.
#
# Note that the value is very close to zero

[Package PSinference version 0.1.0 Index]