AmpliconDuo-package {AmpliconDuo} | R Documentation |
Statistical Analysis Of Amplicon Data Of The Same Sample To Identify Spurious Amplicons
Description
Increasingly powerful techniques for high-throughput sequencing open the possibility to comprehensively characterize microbial communities, including rare species. However, a still unresolved issue are the substantial error rates in the experimental process generating these sequences. To overcome these limitations we propose an approach, where each sample is split and the same amplification and sequencing protocol is applied to both halves. This procedure should allow to detect likely PCR and sequencing artifacts, and true rare species by comparison of the results of both parts.
The AmpliconDuo package, whereas ampliconduo from here on refers to the two amplicon data sets of a split sample, is intended to help interpret the obtained amplicon frequency distribution across split samples, and to filter the false positive amplicons.
Details
Package: | AmpliconDuo |
Type: | Package |
Version: | 1.1.1 |
Date: | 2020-05-22 |
License: | GPL-2 |
The core of this package is the ampliconduo
function, that generates for each pair of a split samples an ampliconduo data frame, while statistically analysing the data by Fisher's exact test.
Ampliconduo data frames, or lists of these, are the input required for all other functions of this package.
plotAmpliconduo
plots for an ampliconduo the amplicon frequencies (number of reads per amplicon) of sample A vs. amplicon frequencies of sample B, highlighting amplicons displaying a significant deviation between both samples.plotAmpliconduo.set
does the same asplotAmpliconduo
but accepts a list of ampliconduo data frames and arranges the plots in a 2-dimensional array.plotORdensity
generates a histogram plot of the amplicon frequency odds ratio density for an ampliconduo data frame. For multiple data frames organizes the plots in a 2-dimentional array.discordance.delta
calculates delta (\Delta
) and delta prime (\Delta'
), the fraction of amplicon frequencies and amplicons, respectively, with a false discovery rate below a certain threshold\theta
as a measure of discordance between two amplicon data sets A and B.filter.ampliconduo
applies filter criteria to an ampliconduo data frame deciding which amplicons are going to be rejected.filter.ampliconduo.set
same asfilter.ampliconduo
for a list af ampliconduo data frames.accepted.amplicons
returns the indices of those amplicons that have passed the filter criteria.
Author(s)
Anja Lange (anja.lange@uni-due.de) and Daniel Hoffmann (daniel.hoffmann@uni-due.de)
Maintainer: Anja Lange (anja.lange@uni-due.de)
References
Lange A, Jost S, Heider D, Bock C, Budeus B, et al. (2015) AmpliconDuo: A Split-Sample Filtering Protocol for High-Throughput Amplicon Sequencing of Microbial Communities. PLOS ONE 10(11): e0141590
Examples
## load test amplicon frequency data ampliconfreqs and vector with sample names site.f
data(ampliconfreqs)
data(site.f)
## generating ampliconduo data frames
## depending on the size if the data sets, may take some time
ampliconduoset <- ampliconduo(ampliconfreqs[,1:4], sample.names = site.f[1:2])
## plot amplicon read numbers of sample A vs. amplicon read numbers of sample B,
## indicating amplicons with significant deviations in their occurence across samples
plotAmpliconduo.set(ampliconduoset, nrow = 3)
## calculate discordance between the two data sets of an ampliconduo
discordance <- discordance.delta(ampliconduoset)
## plot the odds ratio density of ampliconduo data
plotORdensity(ampliconduoset)
## apply filter criteria to remove/mark spurious amplicons
ampliconduoset.f <- filter.ampliconduo.set(ampliconduoset, min.freq = 1, q = 0.05)
## return indices of accepted amplicons, indices correspond to indices of the ampliconfreqs data,
## that were used as input for the ampliconduo function
accep.reads <- accepted.amplicons(ampliconduoset.f)