One-sample gaussianity test in admixture models using Bordes and Vandekerkhove estimation method

Description

Perform the hypothesis test to know whether the unknown mixture component is gaussian or not, knowing that the known one has support on the real line (R). However, the case of non-gaussian known component can be overcome thanks to the basic transformation by cdf. Recall that an admixture model has probability density function (pdf) l = p*f + (1-p)*g, where g is the known pdf and l is observed (others are unknown). Requires optimization (to estimate the unknown parameters) as defined by Bordes & Vandekerkhove (2010), which means that the unknown mixture component must have a symmetric density.

Usage

gaussianity_test(
sample1,
comp.dist,
comp.param,
K = 3,
lambda = 0.2,
support = c("Real", "Integer", "Positive", "Bounded.continuous")
)

Arguments

 sample1 Observed sample with mixture distribution given by l = p*f + (1-p)*g, where f and p are unknown and g is known. comp.dist List with two elements corresponding to the component distributions involved in the admixture model. Unknown elements must be specified as 'NULL' objects. For instance if 'f' is unknown: list(f = NULL, g = 'norm'). comp.param List with two elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R names for distributions. Unknown elements must be specified as 'NULL' objects (e.g. if 'f' is unknown: list(f=NULL, g=list(mean=0,sd=1)). K Number of coefficients considered for the polynomial basis expansion. lambda Rate at which the normalization factor is set in the penalization rule for model selection (in ]0,1/2[). See 'Details' below. support Support of the densities under consideration, useful to choose the polynomial orthonormal basis. One of 'Real', 'Integer', 'Positive', or 'Bounded.continuous'.

Details

See the paper 'False Discovery Rate model Gaussianity test' (Pommeret & Vanderkerkhove, 2017).

Value

A list of 6 elements, containing: 1) the rejection decision; 2) the p-value of the test; 3) the test statistic; 4) the variance-covariance matrix of the test statistic; 5) the selected rank for testing; and 6) a list of the estimates (unknown component weight 'p', shift location parameter 'mu' and standard deviation 's' of the symmetric unknown distribution).

Author(s)

Xavier Milhaud xavier.milhaud.research@gmail.com

Examples

####### Under the null hypothesis H0.
## Parameters of the gaussian distribution to be tested:
list.comp <- list(f = "norm", g = "norm")
list.param <- list(f = c(mean = 2, sd = 0.5),
g = c(mean = 0, sd = 1))
## Simulate and plot the data at hand:
obs.data <- rsimmix(n = 150, unknownComp_weight = 0.9, comp.dist = list.comp,
comp.param = list.param)[['mixt.data']]
plot(density(obs.data))
## Performs the test:
list.comp <- list(f = NULL, g = "norm")
list.param <- list(f = NULL, g = c(mean = 0, sd = 1))
gaussianity_test(sample1 = obs.data, comp.dist = list.comp, comp.param = list.param,
K = 3, lambda = 0.1, support = 'Real')