chargaff.gibbs.test {spgs}R Documentation

Test of CSPR for Dinucleotides Under Gibbs Distribution

Description

Performs a test of Chargaff's second parity rule (CSPR) for dinucleotides under a Gibbsian assumption on the DNA sequence, which was proposed in Hart and Martínez (2012).

Usage

chargaff.gibbs.test(x, maxLag=200)

Arguments

x

either a character vector representing a DNA sequence in which each element contains a single nucleotide, or a DNA sequence stored using the SeqFastadna class from the seqinr package.

maxLag

The maximum number of lags (cylinder lengths) to use in computing variances. the default value is ‘⁠200⁠’.

Details

This function performs a test of Chargaff's second parity rule for dinucleotides assuming the DNA sequence was generated by a Gibbs distribution. Under the null hypothesis, the test statistic \eta is asymptotically \chi^2 on 5 degrees of freedom.

The test is set up as follows:

H_0: the sequence complies with CSPR for dinucleotides
H_1: the sequence does not comply with CSPR for dinucleotides

Value

A list with class "htest" containing the following components:

statistic

the value of the test statistic.

p.value

the p-value of the test.

method

a character string indicating what type of test was performed.

data.name

a character string giving the name of the data.

FHat

the 5-element vector n\hat F used in calculating the test statistic.

pairs

the stochastic matrix of dinucleotide counts used to derive n\hat F.

v

The asymptotic covariance matrix of n\hat F.

n

the length of the DNA sequence.

cutoff

the actual number of lags used by the algorithm to calculate covariances.

maxCutoff

the value specified for the maxLag parameter when the test was performed.

Author(s)

Andrew Hart and Servet Martínez

References

Hart, A.G. and Martínez, S. (2012) A Gibbs approach to Chargaff's second parity rule. J. Stat. Phys. 146(2), 408-422.

See Also

chargaff0.test, chargaff1.test, chargaff2.test, agct.test, ag.test

Examples

#Demonstration on real bacterial sequence
data(nanoarchaeum)
chargaff.gibbs.test(nanoarchaeum)

#Simulate synthetic DNA sequence that does not satisfy Chargaff's second parity rule
trans.mat <- matrix(c(.4, .1, .4, .1, .2, .1, .6, .1, .4, .1, .3, .2, .1, .2, .4, .3), 
ncol=4, byrow=TRUE)
seq <- simulateMarkovChain(500000, trans.mat, states=c("a", "c", "g", "t"))
chargaff.gibbs.test(seq)

[Package spgs version 1.0-4 Index]