admix_test {admix} | R Documentation |
Hypothesis test between unknown components of the admixture models under study
Description
Perform hypothesis test between unknown components of a list of admixture models, where we remind that the i-th admixture model has probability density function (pdf) l_i such that: l_i = p_i * f_i + (1-p_i) * g_i, with g_i the known component density. The unknown quantities p_i and f_i are thus estimated, leading to the test given by the following null and alternative hypothesis: H0: f_i = f_j for all i != j against H1 : there exists at least i != j such that f_i differs from f_j. The test can be performed using two methods, either the comparison of coefficients obtained through polynomial basis expansions of the component densities, or by the inner-convergence property obtained using the IBM approach. See 'Details' below for further information.
Usage
admix_test(
samples = NULL,
sym.f = FALSE,
test.method = c("Poly", "ICV"),
sim_U = NULL,
n_sim_tab = 50,
comp.dist = NULL,
comp.param = NULL,
support = c("Real", "Integer", "Positive", "Bounded.continuous"),
ICV_tunePenalty = TRUE,
conf.level = 0.95,
parallel = FALSE,
n_cpu = 2
)
Arguments
samples |
A list of the K samples to be studied, all following admixture distributions. |
sym.f |
A boolean indicating whether the unknown component densities are assumed to be symmetric or not. |
test.method |
The testing method to be applied. Can be either 'Poly' (polynomial basis expansion) or 'ICV' (inner convergence from IBM). The same testing method is performed between all samples. In the one-sample case, only 'Poly' is available and the test is a gaussianity test. For further details, see section 'Details' below. |
sim_U |
(Used only with 'ICV' testing method, otherwise useless) Random draws of the inner convergence part of the contrast as defined in the IBM approach (see 'Details' below). |
n_sim_tab |
(Used only with 'ICV' testing method, otherwise useless) Number of simulated gaussian processes used in the tabulation of the inner convergence distribution in the IBM approach. |
comp.dist |
A list with 2*K elements corresponding to the component distributions (specified with R native names for these distributions) involved in the K admixture models. Elements, grouped by 2, refer to the unknown and known components of each admixture model, If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.dist' could be specified as follows with K = 3: list(f1 = NULL, g1 = 'norm', f2 = NULL, g2 = 'norm', f3 = NULL, g3 = 'rnorm'). |
comp.param |
A list with 2*K elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in this list must correspond to the native R argument names for these distributions. Elements, grouped by 2, refer to the parameters of unknown and known components of each admixture model. If there are unknown elements, they must be specified as 'NULL' objects. For instance, 'comp.param' could be specified as follows (with K = 3): list(f1 = NULL, g1 = list(mean=0,sd=1), f2 = NULL, g2 = list(mean=3,sd=1.1), f3 = NULL, g3 = list(mean=-2,sd=0.6)). |
support |
(Potentially used with 'Poly' testing method, otherwise useless) The support of the observations; one of "Real", "Integer", "Positive", or "Bounded.continuous". |
ICV_tunePenalty |
(default to TRUE) Boolean used to tune the penalty term in the k-sample test (k=2,3,...,K) when using Inversion Best Matching (IBM) approach coupled to Inner ConVergence (ICV) property. Particularly useful when studying unbalanced samples (in terms of sample size) or small-sized samples. |
conf.level |
The confidence level of the K-sample test. |
parallel |
(default to FALSE) Boolean indicating whether parallel computations are performed (speed-up the tabulation). |
n_cpu |
(default to 2) Number of cores used when parallelizing. |
Details
For further details on hypothesis techniques, see i) Inner convergence through IBM approach at https://hal.science/hal-03201760 ; ii) Polynomial expansions at 'False Discovery Rate model Gaussianity test' (EJS, Pommeret & Vanderkerkhove, 2017), or 'Semiparametric two-sample admixture components comparison test: the symmetric case' (JSPI, Milhaud & al., 2021).
Value
A list containing the decision of the test (reject or not), the confidence level at which the test is performed, the p-value of the test, and the value of the test statistic (following a chi2 distribution with one degree of freedom under the null).
Author(s)
Xavier Milhaud xavier.milhaud.research@gmail.com
Examples
##### On a simulated example, with 1 sample (gaussianity test):
list.comp <- list(f1 = "norm", g1 = "norm")
list.param <- list(f1 = list(mean = 0, sd = 1), g1 = list(mean = 2, sd = 0.7))
## Simulate data:
sim1 <- rsimmix(n = 300, unknownComp_weight = 0.85, comp.dist = list(list.comp$f1,list.comp$g1),
comp.param = list(list.param$f1, list.param$g1))$mixt.data
## Perform the test hypothesis:
list.comp <- list(f1 = NULL, g1 = "norm")
list.param <- list(f1 = NULL, g1 = list(mean = 2, sd = 0.7))
gaussTest <- admix_test(samples = list(sim1), sym.f = TRUE, test.method = 'Poly', sim_U = NULL,
n_sim_tab = 50, comp.dist = list.comp, comp.param = list.param,
support = "Real", conf.level = 0.95, parallel = FALSE, n_cpu = 2)