R: Estimate TE dynamics using mismatch data

EstDynamics {TE}

R Documentation

Estimate TE dynamics using mismatch data

Description

Given the number of mismatches and element lengths for an LTR retrotransposon family, estimate the age distribution, insertion rate, and deletion rates.

Usage

EstDynamics(mismatch, len, r = 0.013, perturb = 2, rateRange = NULL,
  plotFit = FALSE, plotSensitivity = FALSE, pause = plotFit &&
  plotSensitivity, main = sprintf("n = %d", n))

EstDynamics2(mismatch, len, r = 0.013, nTrial = 10L, perturb = 2,
  rateRange = NULL, plotFit = FALSE, plotSensitivity = FALSE,
  pause = plotFit && plotSensitivity, ...)

Arguments

`mismatch`	A vector containing the number of mismatches.
`len`	A vector containing the length of each element.
`r`	Mutation rate (substitutions/(million year * site)) used in the calculation.
`perturb`	A scalar multiple to perturb the estimated death rate from the null hypothesis estimate. Used to generate the sensitivity analysis.
`rateRange`	A vector of death rates, an alternative to `perturb` for specifying the death rates.
`plotFit`	Whether to plot the distribution fits.
`plotSensitivity`	Whether to plot the sensitivity analysis.
`pause`	Whether to pause after each plot.
`main`	The title for the plot.
`nTrial`	The number of starting points for searching for the MLE.
`...`	Pass to EstDynamics

Details

EstDynamics estimates the TE dynamics through fitting a negative binomial fit to the mismatch data, while EstDynamics2 uses a mixture model. For detailed implementation see References.

Value

EstDynamics returns a TEfit object, containing the following fields, where the unit for time is million years ago (Mya):

`pvalue`	The p-value for testing H_0: The insertion rate is uniform over time.
`ageDist`	A list containing the estimated age distributions.
`insRt`	A list containing the estimated insertion rates.
`agePeakLoc`	The maximum point (in age) of the age distribution.
`insPeakLoc`	The maximum point (in time) of the insertion rate.
`estimates`	The parameter estimates from fitting the distributions; see References
`sensitivity`	A list containing the results for the sensitivity analysis, with fields `time`: time points; `delRateRange`: A vector for the range of deletion rates; `insRange`: A matrix whose columns contain the insertion rates under different scenarios.
`n`	The sample size.
`meanLen`	The mean of element length.
`meanDiv`	The mean of divergence.
`KDE`	A list containing the kernel density estimate for the mismatch data.
`logLik`	The log-likelihoods of the parametric fits.

This function returns a TEfit2 object, containing all the above fields for TEfit and the following:

`estimates2`	The parameter estimates from fitting the mixture distribution.
`ageDist2`	The estimated age distribution from fitting the mixture distribution.
`insRt2`	The estimated insertion rate from fitting the mixture distribution.
`agePeakLoc2`	Maximum point(s) for the age distribution.
`insPeakLoc2`	Maximum point(s) for the insertion rate.

References

Dai, X., Wang, H., Dvorak, J., Bennetzen, J., Mueller, H.-G. (2018). "Birth and Death of LTR Retrotransposons in Aegilops tauschii". Genetics

Examples

# Analyze Gypsy family 24 (Nusif)
data(AetLTR)
dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr))
set.seed(1)
res1 <- EstDynamics(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, plotSensitivity=FALSE, pause=FALSE)

# p-value for testing a uniform insertion rate
res1$pvalue


# Use a mixture distribution to improve fit
res2 <- EstDynamics2(dat$Mismatch, dat$UngapedLen, plotFit=TRUE)

# A larger number of trials is recommended to achieve the global MLE
## Not run: 
res3 <- EstDynamics2(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, nTrial=1000L)

## End(Not run)

[Package TE version 0.3-0 Index]