discr.test.two_sample {mgc}R Documentation

Discriminability Two Sample Permutation Test

Description

A function that takes two sets of paired data and tests of whether or not the data is more, less, or non-equally discriminable between the set of paired data.

Usage

discr.test.two_sample(
  X1,
  X2,
  Y,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidian"),
  dist.return = NULL,
  remove.isolates = TRUE,
  nperm = 500,
  no_cores = 1,
  alt = "greater"
)

Arguments

X1

is interpreted as a [n x d] data matrix with n samples in d dimensions. Should NOT be a distance matrix.

X2

is interpreted as a [n x d] data matrix with n samples in d dimensions. Should NOT be a distance matrix.

Y

[n] a vector containing the sample ids for our n samples. Should be matched such that Y[i] is the corresponding label for X1[i,] and X2[i,].

dist.xfm

if is.dist == FALSE, a distance function to transform X. If a distance function is passed, it should accept an [n x d] matrix of n samples in d dimensions and return a [n x n] distance matrix as the $D return argument. See mgc.distance for details.

dist.params

a list of trailing arguments to pass to the distance function specified in dist.xfm. Defaults to list(method='euclidean').

dist.return

the return argument for the specified dist.xfm containing the distance matrix. Defaults to FALSE.

is.null(dist.return)

use the return argument directly from dist.xfm as the distance matrix. Should be a [n x n] matrix.

is.character(dist.return) | is.integer(dist.return)

use dist.xfm[[dist.return]] as the distance matrix. Should be a [n x n] matrix.

remove.isolates

remove isolated samples from the dataset. Isolated samples are samples with only one instance of their class appearing in the Y vector. Defaults to TRUE.

nperm

the number of permutations for permutation test. Defualts to 500.

no_cores

the number of cores to use for the permutations. Defaults to 1.

alt

the alternative hypothesis. Can be that first dataset is more discriminable (alt = 'greater'), less discriminable (alt = 'less'), or just non-equal (alt = 'neq'). Defaults to "greater".

Value

A list containing the following:

stat

the observed test statistic. the test statistic is the difference in discriminability of X1 vs X2.

discr

the discriminabilities for each of the two data sets, as a list.

null

the null distribution of the test statistic, computed via permutation.

p.value

The p-value associated with the test.

alt

The alternative hypothesis for the test.

Details

A function that performs a two-sample test for whether the discriminability is different for that of one dataset vs another, as described in Bridgeford et al. (2019). With \hat D_{X_1} the sample discriminability of one approach, and \hat D_{X_2} the sample discriminability of another approach:

H_0: D_{X_1} = D_{X_2}

and:

H_A: D_{X_1} > D_{X_2}

. Also implemented are tests of < and \neq.

Author(s)

Eric Bridgeford

References

Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).

Examples

## Not run: 
require(mgc)
require(MASS)

n = 100; d=5

# generate two subjects truths; true difference btwn
# subject 1 (column 1) and subject 2 (column 2)
mus <- cbind(c(0, 0), c(1, 1))
Sigma <- diag(2)  # dimensions are independent

# first dataset X1 contains less noise than X2
X1 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)}))
X2 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)}))
Y <- do.call(c, lapply(1:2, function(i) rep(i, 50)))

# X1 should be more discriminable, as less noise
discr.test.two_sample(X1, X2, Y, alt="greater")$p.value  # p-value is small

## End(Not run)

[Package mgc version 2.0.2 Index]