R: Read Genotype Data in POPDIST Format

read.POPDIST {polysat}

R Documentation

Read Genotype Data in POPDIST Format

Description

read.POPDIST reads one or more text files formatted for the software POPDIST and produces a "genambig" object containing genotypes, ploidies, and population identities from the file(s).

Usage

read.POPDIST(infiles)

Arguments

infiles

A character vector of file paths to be read.

Details

The format for the software POPDIST is a modified version of the popular Genepop format. The first line is a comment line, followed by a list of locus names, each on a separate line or on one line separated by commas. A line starting with the string “Pop” (“pop” and “POP” are also recognized) indicates the beginning of data for one population. Each individual is then represented on one line, with the population name and individual genotype separated by a tab followed by a comma. Genotypes for different loci are separated by a tab or space. Each allele must be coded by two digits. Zeros (“00”) indicate missing data, either for an entire locus or for a partially heterozygous genotype. Partially heterozygous genotypes can also be represented by the arbitrary duplication of alleles.

If more than one file is read at once, locus names must be consistent across all files. Locus and population names should not start with “Pop”, “pop”, or “POP”, as read.POPDIST searches for these character strings in order to identify the lines that delimit populations.

Value

A "genambig" object. The Description slot of the object is taken from the comment line of the first file. Locus names are taken from the files, and samples are given numbers instead of names. Each genotype consists of all unique non-zero integers for a given sample and locus. The Ploidies slot is filled in based on how many alleles are present at each locus of each sample (the number of characters for the genotype, divided by two). reformatPloidies is used internally by the function to collapse the ploidies to the simplest format. Population names are taken from the individual genotype lines, and population identities are recorded based on how the individuals are delimited by “Pop” lines.

Author(s)

Lindsay V. Clark

References

Tomiuk, J., Guldbrandtsen, B. and Loeschcke, B. (2009) Genetic similarity of polyploids: a new version of the computer program POPDIST (version 1.2.0) considers intraspecific genetic differentiation. Molecular Ecology Resources 9, 1364-1368.

Guldbrandtsen, B., Tomiuk, J. and Loeschcke, B. (2000) POPDIST version 1.1.1: A program to calculate population genetic distance and identity measures. Journal of Heredity 91, 178-179.

Examples

# Create a file to read (this is typically done in a text editor)
myfile <- tempfile()
cat("An example for the read.POPDIST documentation.",
"abcR",
"abcQ",
"Pop",
"Piscataqua\t, 0204 0505",
"Piscataqua\t, 0404 0307",
"Piscataqua\t, 050200 030509",
"Pop",
"Salmon Falls\t, 1006\t0805",
"Salmon Falls\t, 0510\t0308",
"Pop",
"Great Works\t, 050807 030800",
"Great Works\t, 0000 0408",
"Great Works\t, 0707 0305",
file=myfile, sep="\n")

# View the file in the R console (or open it in a text editor)
cat(readLines(myfile), sep="\n")

# Read the file into a "genambig" object
fishes <- read.POPDIST(myfile)

# View the data in the object
summary(fishes)
PopNames(fishes)
PopInfo(fishes)
Ploidies(fishes)
viewGenotypes(fishes)

[Package polysat version 1.7-7 Index]