R: Discriminability Two Sample Permutation Test

discr.test.two_sample {mgc}

R Documentation

Discriminability Two Sample Permutation Test

Description

A function that takes two sets of paired data and tests of whether or not the data is more, less, or non-equally discriminable between the set of paired data.

Usage

discr.test.two_sample(
  X1,
  X2,
  Y,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidian"),
  dist.return = NULL,
  remove.isolates = TRUE,
  nperm = 500,
  no_cores = 1,
  alt = "greater"
)

Arguments

`X1`	is interpreted as a `[n x d]` data matrix with `n` samples in `d` dimensions. Should NOT be a distance matrix.
`X2`	is interpreted as a `[n x d]` data matrix with `n` samples in `d` dimensions. Should NOT be a distance matrix.
`Y`	`[n]` a vector containing the sample ids for our `n` samples. Should be matched such that `Y[i]` is the corresponding label for `X1[i,]` and `X2[i,]`.
`dist.xfm`	if `is.dist == FALSE`, a distance function to transform `X`. If a distance function is passed, it should accept an `[n x d]` matrix of `n` samples in `d` dimensions and return a `[n x n]` distance matrix as the `$D` return argument. See mgc.distance for details.
`dist.params`	a list of trailing arguments to pass to the distance function specified in `dist.xfm`. Defaults to `list(method='euclidean')`.
`dist.return`	the return argument for the specified `dist.xfm` containing the distance matrix. Defaults to `FALSE`. `is.null(dist.return)` use the return argument directly from `dist.xfm` as the distance matrix. Should be a `[n x n]` matrix. `is.character(dist.return) \| is.integer(dist.return)` use `dist.xfm[[dist.return]]` as the distance matrix. Should be a `[n x n]` matrix.
`remove.isolates`	remove isolated samples from the dataset. Isolated samples are samples with only one instance of their class appearing in the `Y` vector. Defaults to `TRUE`.
`nperm`	the number of permutations for permutation test. Defualts to `500`.
`no_cores`	the number of cores to use for the permutations. Defaults to `1`.
`alt`	the alternative hypothesis. Can be that first dataset is more discriminable (`alt = 'greater'`), less discriminable (`alt = 'less'`), or just non-equal (`alt = 'neq'`). Defaults to `"greater"`.

Value

A list containing the following:

`stat`	the observed test statistic. the test statistic is the difference in discriminability of X1 vs X2.
`discr`	the discriminabilities for each of the two data sets, as a list.
`null`	the null distribution of the test statistic, computed via permutation.
`p.value`	The p-value associated with the test.
`alt`	The alternative hypothesis for the test.

Details

A function that performs a two-sample test for whether the discriminability is different for that of one dataset vs another, as described in Bridgeford et al. (2019). With \hat D_{X_1} the sample discriminability of one approach, and \hat D_{X_2} the sample discriminability of another approach:

H_0: D_{X_1} = D_{X_2}

and:

H_A: D_{X_1} > D_{X_2}

. Also implemented are tests of < and \neq.

Author(s)

Eric Bridgeford

References

Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).

Examples

## Not run: 
require(mgc)
require(MASS)

n = 100; d=5

# generate two subjects truths; true difference btwn
# subject 1 (column 1) and subject 2 (column 2)
mus <- cbind(c(0, 0), c(1, 1))
Sigma <- diag(2)  # dimensions are independent

# first dataset X1 contains less noise than X2
X1 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)}))
X2 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)}))
Y <- do.call(c, lapply(1:2, function(i) rep(i, 50)))

# X1 should be more discriminable, as less noise
discr.test.two_sample(X1, X2, Y, alt="greater")$p.value  # p-value is small

## End(Not run)

[Package mgc version 2.0.2 Index]