ReplMatch {MHCtools} | R Documentation |
ReplMatch() function
Description
In amplicon filtering it is sometimes valuable to compare technical
replicates in order to estimate the accuracy of a genotyping experiment. This
may be done both to optimize filtering settings and to estimate repeatability
to report in a publication. ReplMatch
is designed to
automatically compare technical replicates in an amplicon filtering data set
and report the proportion of mismatches. The functions GetReplTable() and
GetReplStats() are designed to evaluate the output files.
Usage
ReplMatch(repl_table, seq_table, path_out)
Arguments
repl_table |
is a table containing the sample names of technical replicates in the data set. This table should be organized so that the individual names are in the first column (Sample_ID), and the index number of the replicate set is in the second column (Replic_set). Replicate sets may contain more than two replicates, but sets must be numbered consecutively beginning at 1. |
seq_table |
seq_table is a sequence table as output by the 'dada2' pipeline, which has samples in rows and nucleotide sequence variants in columns. |
path_out |
is a user defined path to the folder where the output files will be saved. |
Details
Note: ReplMatch() will throw a warning if all samples in a replicate set have 0 sequences. In that case, the mean_props for that replicate set and the repeatability for the data set will be NaN, and ReplMatch() will report which replicate set is problematic and suggest to remove it from the repl_table. If removing replicate sets, beware that the replicate sets in repl_table must be numbered consecutively beginning at 1.
If you publish data or results produced with MHCtools, please cite both of the following references: Roved, J. 2022. MHCtools: Analysis of MHC data in non-model species. Cran. Roved, J., Hansson, B., Stervander, M., Hasselquist, D., & Westerdahl, H. 2022. MHCtools - an R package for MHC high-throughput sequencing data: genotyping, haplotype and supertype inference, and downstream genetic analyses in non-model organisms. Molecular Ecology Resources. https://doi.org/10.1111/1755-0998.13645
Value
A set of R lists containing for each replicate set the observed sequence variants, the names of the sequences that were incongruent in the replicates, and the mean proportion of incongruent sequences (if 100 matches are expected between the replicates, this is equivalent of an error rate in the sequencing process). The sequences are named in the output by an index number corresponding to their column number in the sequence table, thus identical sequences will have identical sample names in all the output files. These files can be reopened in R e.g. using the readRDS() function in the base package.
See Also
GetReplTable
; GetReplStats
; for more
information about 'dada2' visit <https://benjjneb.github.io/dada2/>
Examples
repl_table <- replicates_table
seq_table <- sequence_table_repl
path_out <- tempdir()
ReplMatch(repl_table, seq_table, path_out)