R: Read Genotypes Produced by STRand Software

read.STRand {polysat}

R Documentation

Read Genotypes Produced by STRand Software

Description

This function reads in data in a format derived from the “BTH” format for exporting genotypes from the allele calling software STRand.

Usage

read.STRand(file, sep = "\t", popInSam = TRUE)

Arguments

`file`	A text string indicating the file to read.
`sep`	Field delimiter for the file. Tab by default.
`popInSam`	Boolean. If `TRUE`, fields from the “Pop” and “Ind” columns will be concatenated to create a sample name. If `FALSE`, only the “Ind” column will be used for sample names.

Details

This function does not read the files directly produced from STRand, but requires some simple clean-up in spreadsheet software. The BTH format in STRand produces two columns per locus. One of these columns should be deleted so that there is just one column per locus. Loci names should remain in the column headers. The column containing sample names should be deleted or renamed “Ind”. A “Pop” column will need to be added, containing population names. An “Ind” column is also necessary, containing either full sample names or a sample suffix to be concatenated with the population name (see popInSam argument).

STRand adds an asterisk to the end of any genotype with more than two alleles. read.STRand will automatically strip this asterisk out of the genotype.

Missing data is indicated by a zero in the file.

Value

A "genambig" object containing genotypes, locus and sample names, population names, and population identities from the file.

Author(s)

Lindsay V. Clark

References

https://vgl.ucdavis.edu/STRand

Toonen, R. J. and Hughes, S. (2001) Increased Throughput for Fragment Analysis on ABI Prism 377 Automated Sequencer Using a Membrane Comb and STRand Software. Biotechniques 31, 1320–1324.

Examples

# generate file to read
strtemp <- data.frame(Pop=c("P1","P1","P2","P2"),
                      Ind=c("a","b","a","b"),
                      LocD=c("0","172/174","170/172/178*","172/176"),
                      LocG=c("130/136/138/142*","132/136","138","132/140/144*"))
myfile <- tempfile()
write.table(strtemp, file=myfile, sep="\t",
            row.names=FALSE, quote=FALSE)

# read the file
mydata <- read.STRand(myfile)
viewGenotypes(mydata)
PopNames(mydata)

# alternative example with popInSam=FALSE
strtemp$Ind <- c("OH1","OH5","MT4","MT7")
write.table(strtemp, file=myfile, sep="\t",
            row.names=FALSE, quote=FALSE)
mydata <- read.STRand(myfile, popInSam=FALSE)
Samples(mydata)
PopNames(mydata)

[Package polysat version 1.7-7 Index]