R: Multivariate ANalysis Of VAriance Inference and Test with...

scMANOVA {semicontMANOVA}

R Documentation

Multivariate ANalysis Of VAriance Inference and Test with Ridge Regularization for Semicontinuous High-Dimensional Data

Description

scMANOVA performs Multivariate ANalysis Of VAriance (MANOVA) inference and test with ridge regularization in presence of semicontinuous high-dimensional data. The test is based on a Likelihood Ratio Test statistic and the p-value can be computed using either asymptotic distribution (p.value.perm = FALSE) or via permutation procedure (p.value.perm = TRUE). There is the possibility to provide as input the regularization parameters or to choose them through an optimization procedure.

Usage

scMANOVA(x, n, lambda = NULL, lambda0 = NULL, lambda.step = 0.1,
  ident = FALSE, tol = 1e-08, penalty = function(n, p) log(n),
  B = 500, p.value.perm = FALSE, fixed.lambda = FALSE, ...)

Arguments

`x`	`data.frame` or `matrix` of data with units on the rows and variables on the columns
`n`	`vector`. The length corresponds to the number of groups, the elements to the number of observations in each group
`lambda`	`NULL`, a scalar or a `vector` of length 2. Ridge regularization parameter. The optimal value of `lambda` is searched in the interval [0,100] if `NULL`, and in the specified interval when it is a vector of length 2, otherwise it is used as the optimal value
`lambda0`	`NULL`, a scalar or a `vector` of length 2. Ridge regularization parameter under null hypothesis. The optimal value of `lambda0` is searched in the interval [0,100] if `NULL`, and in the specified interval when it is a vector of length 2, otherwise it is used as the optimal value
`lambda.step`	scalar. Step size used in the optimization procedure to find the smallest value of `lambda` (and `lambda0`) that makes the covariance matrices, under the alternative and under the null hypotheses, non singular
`ident`	`logical`. If `TRUE`, `lambda` times the identity matrix is added to the raw estimated covariance matrix, if `FALSE` the diagonal values of the raw estimated covariance matrix are used instead
`tol`	scalar. Used in the optimization procedure to find the smallest value of `lambda` (and `lambda0`) that makes the covariance matrices, under the alternative and under the null, non singular
`penalty`	`function` with two arguments: sample size (`n`) and number of variables (`p`) used as penalty function in the definition of the Information Criterion to select the optimal values for `lambda` and `lambda0`
`B`	scalar. Number of permutations to run in the permutation test
`p.value.perm`	`logical`. If `TRUE` a p-value from a permutation test is evaluated, otherwise an asymptotic value is reported
`fixed.lambda`	`logical`. If `TRUE` the optimal values for `lambda` and `lambda0` are evaluated just once for the observed dataset and kept fixed during the permutation test, otherwise, optimal values are evaluated for each permuted datsets
`...`	further parameters passed to function `scMANOVApermTest`

Value

An object of class scMANOVA which is a list with the following components

`pi`	`matrix`. Estimated proportion of missing values for each group
`mu`	`matrix`. Estimated mean vector for each group
`sigmaRidge`	`matrix`. Estimated covariance matrix with ridge regularization
`sigma`	`matrix`. Estimated covariance matrix by maximum likelihood
`pi0`	`vector`. Estimated proportion of missing values under the null hypothesis
`mu0`	`vector`. Estimated mean vector under the null hypothesis
`sigma0Ridge`	`matrix`. Estimated covariance matrix with ridge regularization under null hypothesis
`sigma0`	`matrix`. Estimated covariance matrix by maximum likelihood under null hypothesis
`removed.vars`	`vector` or `NULL`. columns removed in the continuous part of the log-likelihood dues to insufficient number of observations in each group
`logLikPi`	scalar. Log-likelihood for the discrete part of the model
`logLik`	scalar. Log-likelihood
`logLikPi0`	scalar. Log-likelihood for the discrete part of the model under the null hypothesis
`logLik0`	scalar. Log-likelihood under null hypothesis
`statistic`	scalar. Wilks statistics
`lambda`	scalar. Regularization parameter
`lambda0`	scalar. Regularization parameter under null hypothesis
`df`	scalar. Model degree of freedom
`df0`	scalar. Model degree of freedom under null hypothesis
`aic`	scalar. Information criteria
`aic0`	scalar. Information criteria under null hypothesis
`p.value`	scalar. p-value of the Wilks statistic

Author(s)

Elena Sabbioni, Claudio Agostinelli and Alessio Farcomeni

References

Elena Sabbioni, Claudio Agostinelli and Alessio Farcomeni (2024) A regularized MANOVA test for semicontinuous high-dimensional data. arXiv: http://arxiv.org/abs/2401.04036

Examples

  set.seed(1234)
  n <- c(5,5)
  p <- 20
  pmiss <- 0.1
  x <- scMANOVAsimulation(n=n, p=p, pmiss=pmiss)
  res.asy <- scMANOVA(x=x, n=n) # Asymptotic p.value
  res.asy
  
    res.perm <- scMANOVA(x=x, n=n, p.value.perm=TRUE) # p-value by permutation test 
    res.perm

[Package semicontMANOVA version 0.1-8 Index]