ADtest {MSCsimtester}R Documentation

Anderson-Darling test comparing sample and theoretical pairwise distance distributions.

Description

Takes as input theoretical pairwise distance densities under the MSC and empirical pairwise distances from gene trees in a sample, as returned by the function pairwiseDist. Uses the package kSamples to perform either one test on the entire dataset or multiple tests on subsamples.

Usage

ADtest(distanceDensities, subsampleSize = FALSE)

Arguments

distanceDensities

A list containing values needed for performing Anderson-Darling test(s) on a gene tree sample and species tree, as output by pairwiseDist. For details, see code for pairwiseDist.

subsampleSize

A positive integer to perform multiple tests on subsamples, or FALSE (default) to perform one test on full sample.

Details

The Anderson-Darling test compares the empirical distance distribution for a supplied gene tree sample to a sample drawn from the theoretical distribution. The output, passed from the kSamples package, thus says that 2 samples are being compared, to test a null-hypothesis that they come from the same distribution. See kSamples documentation for function ad.test for more details.

Repeated runs of this function will give different results, since the sample from the theoretical distribution will vary. Under the null hypothesis p-values for different runs should be approximately uniformly distributed.

Numerical issues may result in poor performance of Anderson-Darling tests when the sample size is very large, so an optional parameter subsampleSize can be set to create subsamples of smaller size. If subsampleSize is a positive integer, Anderson-Darling tests are performed on each subset, comparing them to a random sample of the same size from the theoretical distribution. Good fit is indicated by an approximately uniform distribution of the subsample p-values.

Value

An object of type ADtestOutput including a sample $Sample from the theoretical distance distribution of the same size as the empirical one, and $ADtest which is of type kSamples and has all output from the Anderson-Darling test if only one test was performed, or the number of tests if tests were performed on subsamples.

See Also

pairwiseDist, kSamples-package

Examples

stree=read.tree(text="((((a:10000,b:10000):10000,c:20000):10000,d:30000):10000,e:40000);")
pops=c(15000,25000,10000,1,1,1,1,1,12000)
gts=read.tree(file=system.file("extdata","genetreeSample",package="MSCsimtester"))
distDen=pairwiseDist(stree,pops,gts,"a","b")
ADtest(distDen)
ADtest(distDen,1000) 


[Package MSCsimtester version 1.0.0 Index]