readDArTag {polyRAD} | R Documentation |
Import Data from DArT Sequencing
Description
Diversity Array Technologies (DArT)
provides a tag-based genotyping-by-sequencing service. Together with
Breeding Insight, a format was
developed indicting haplotype sequence and read depth, and that format is
imported by this function to make a RADdata
object. The target
SNP and all off-target SNPs within the amplicon are imported as haplotypes.
Because the file format does not indicate strandedness of the tag, BLAST
results are used so that sequence and position are accurately stored in the
RADdata
object. See the “extdata” folder of the polyRAD
installation for example files.
Usage
readDArTag(file, botloci = NULL, blastfile = NULL, excludeHaps = NULL,
includeHaps = NULL, n.header.rows = 0, sample.name.row = 1,
trim.sample.names = "_[^_]+_[ABCDEFGH][[:digit:]][012]?$",
sep.counts = ",", sep.blast = "\t", possiblePloidies = list(2),
taxaPloidy = 2L, contamRate = 0.001)
Arguments
file |
The file name of a spreadsheet from DArT indicating haplotype sequence and read depth. |
botloci |
A character vector indicating the names of loci for which the sequence is on the
bottom strand with respect to the reference genome. All other loci are assumed
to be on the top strand. Only one of |
blastfile |
File name for BLAST results for haplotypes. The file should be in tabular format
with |
excludeHaps |
Optional. Character vector with names of haplotypes (from the “AlleleID”
column) that should not be imported. Should not be used if |
includeHaps |
Optional. Character vector with names of haplotypes (from the “AlleleID”
column) that should be imported. Should not be used if |
n.header.rows |
Integer. The number of header rows in |
sample.name.row |
Integer. The row within |
trim.sample.names |
A regular expression indicating text to trim off of sample names. Use |
sep.counts |
The field separator character for |
sep.blast |
The field separator character for the BLAST results. The default assumes tab-delimited. |
possiblePloidies |
A list indicating possible inheritance modes. See |
taxaPloidy |
A single integer, or an integer vector with one value per taxon, indicating
ploidy. See |
contamRate |
Expected sample cross-contamination rate. See |
Details
The “CloneID” column is used for locus names, and is assumed to contain
the chromosome (or scaffold) name and position, separated by an underscore.
The position is assumed to refer to the target SNP, which is identified by
comparing the “Ref_001” and “Alt_002” sequences. The position
is then converted to refer to the beginning of the tag (which may have been
reverse complemented depending on BLAST results), since additional SNPs may
be present. This facilitates accurate export to VCF using
RADdata2VCF
.
Column names for the BLAST file can be “Query”, “Subject”, “S_start”, “S_end”, and “%Identity”, for compatibility with Breeding Insight formats.
Value
A RADdata
object ready for QC and genotype calling. Assuming
the “Ref_001” and “Alt_002” alleles were not excluded, the
locTable
slot will include columns for chromosome, position, strand, and
reference sequence.
Author(s)
Lindsay V. Clark
References
https://www.diversityarrays.com/
See Also
readTagDigger
, VCF2RADdata
,
readStacks
, readTASSELGBSv2
,
readHMC
Examples
## Older Excellence in Breeding version
# Example files installed with polyRAD
dartfile <- system.file("extdata", "DArTag_example.csv", package = "polyRAD")
blastfile <- system.file("extdata", "DArTag_BLAST_example.txt",
package = "polyRAD")
# One haplotype doesn't seem to have correct alignment (see BLAST results)
exclude_hap <- c("Chr1_30668472|RefMatch_004")
# Import data
mydata <- readDArTag(dartfile, blastfile = blastfile,
excludeHaps = exclude_hap,
possiblePloidies = list(4),
n.header.rows = 7, sample.name.row = 7)
## Newer Excellence in Breeding version (2022)
# Example files installed with polyRAD
dartfile <- system.file("extdata", "DArTag_example2.csv", package = "polyRAD")
blastfile <- system.file("extdata", "DArTag_BLAST_example2.txt",
package = "polyRAD")
# One haplotype doesn't seem to have correct alignment (see BLAST results)
exclude_hap <- c("Chr1_30668472|RefMatch_0004")
# Import data
mydata <- readDArTag(dartfile, blastfile = blastfile,
excludeHaps = exclude_hap,
possiblePloidies = list(4),
n.header.rows = 0, sample.name.row = 1)