read.Structure {polysat} | R Documentation |
Read Genotypes and Other Data from a Structure File
Description
read.Structure
creates a genambig
object by reading a text
file formatted for the software Structure. Ploidies
and
PopInfo
(if
available) are also written to the object, and data from additional
columns can optionally be extracted as well.
Usage
read.Structure(infile, ploidy, missingin = -9, sep = "\t",
markernames = TRUE, labels = TRUE, extrarows = 1,
popinfocol = 1, extracols = 1, getexcols = FALSE,
ploidyoutput="one")
Arguments
infile |
Character string. The file path to be read. |
ploidy |
Integer. The ploidy of the file, i.e. how many rows there are for each individual. |
missingin |
The symbol used to represent missing data in the Structure file. |
sep |
The character used to delimit the fields of the Structure file (tab by default). |
markernames |
Boolean, indicating whether the file has a header containing marker names. |
labels |
Boolean, indicating whether the file has a column containing sample names. |
extrarows |
Integer. The number of extra rows that the file has, not counting marker names. This could include rows for recessive alleles, inter-marker distances, or phase information. |
popinfocol |
Integer. The column number (after the labels column, if present)
where the data to be used for |
extracols |
Integer. The number of extra columns that the file has, not counting
sample names (labels) but counting the column to be used for
|
getexcols |
Boolean, indicating whether the function should return the data from any extra columns. |
ploidyoutput |
This argument determines what is assigned to the |
Details
The current version of read.Structure
does not support the
ONEROWPERIND option in the file format. Each locus must only have one
column. If your data are in ONEROWPERIND format, it should be fairly
simple to manipulate it in a spreadsheet program so that it can be read
by read.GeneMapper
instead.
read.Structure
uses read.table
to initially read the file into a
data frame, then extracts information from the data frame. Because of
this, any header rows (particularly the one containing marker names)
should have leading tabs (or spaces if sep=" "
) so that the marker
names align correctly with their corresponding genotypes. You should be
able to open the file in a spreadsheet program and have everything align
correctly.
If the file does not contain sample names, set labels=FALSE
. The
samples will be numbered instead, and if you like you can use the
Samples<-
function to edit the sample names of the genotype object after
import. Likewise, if markernames=FALSE
, the
loci will be numbered automatically by the column names that
read.table
creates, but these can also be edited after the fact.
The Ploidies
slot of the "genambig"
object that is created
is initially indexed by both sample and locus, with ploidy being
written to the slot on a per-genotype basis. After all genotypes have
been imported, reformatPloidies
is used to convert
Ploidies
to the simplest possible format before the object is returned.
Value
If getexcols=FALSE
, the function returns only a genambig
object.
If getexcols=TRUE
, the function returns a list with two elements. The
first, named ExtraCol
, is a data frame, where the row names are the
sample names and each column is one of the extra columns from the file
(but with each sample only once instead of being repeated ploidy
number of times). The second element is named Dataset
and is the
genotype object described above.
Author(s)
Lindsay V. Clark
References
Hubisz, M. J., Falush, D., Stephens, M. and Pritchard, J. K. (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources 9, 1322–1332.
Falush, D., Stephens, M. and Pritchard, J. K. (2007) Inferences of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes 7, 574–578.
See Also
write.Structure
, read.GeneMapper
,
read.Tetrasat
, read.ATetra
,
read.GenoDive
,
read.SPAGeDi
, read.POPDIST
,
read.STRand
Examples
# create a file to read (normally done in a text editor or spreadsheet
# software)
myfile <- tempfile()
cat("\t\tRhCBA15\tRhCBA23\tRhCBA28\tRhCBA14\tRUB126\tRUB262\tRhCBA6\tRUB26",
"\t\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"WIN1B\t1\t197\t98\t152\t170\t136\t208\t151\t99",
"WIN1B\t1\t208\t106\t174\t180\t166\t208\t164\t99",
"WIN1B\t1\t211\t98\t182\t187\t184\t208\t174\t99",
"WIN1B\t1\t212\t98\t193\t170\t203\t208\t151\t99",
"WIN1B\t1\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"WIN1B\t1\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"WIN1B\t1\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"WIN1B\t1\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD1\t2\t208\t100\t138\t160\t127\t202\t151\t124",
"MCD1\t2\t208\t102\t153\t168\t138\t207\t151\t134",
"MCD1\t2\t208\t106\t157\t180\t162\t211\t151\t137",
"MCD1\t2\t208\t110\t159\t187\t127\t215\t151\t124",
"MCD1\t2\t208\t114\t168\t160\t127\t224\t151\t124",
"MCD1\t2\t208\t124\t193\t160\t127\t228\t151\t124",
"MCD1\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD1\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD2\t2\t208\t98\t138\t160\t136\t202\t150\t120",
"MCD2\t2\t208\t102\t144\t174\t145\t214\t150\t132",
"MCD2\t2\t208\t105\t148\t178\t136\t217\t150\t135",
"MCD2\t2\t208\t114\t151\t184\t136\t227\t150\t120",
"MCD2\t2\t208\t98\t155\t160\t136\t202\t150\t120",
"MCD2\t2\t208\t98\t157\t160\t136\t202\t150\t120",
"MCD2\t2\t208\t98\t163\t160\t136\t202\t150\t120",
"MCD2\t2\t208\t98\t138\t160\t136\t202\t150\t120",
"MCD3\t2\t197\t100\t172\t170\t159\t213\t174\t134",
"MCD3\t2\t197\t106\t174\t178\t193\t213\t176\t132",
"MCD3\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD3\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD3\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD3\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD3\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
"MCD3\t2\t-9\t-9\t-9\t-9\t-9\t-9\t-9\t-9",
sep="\n",file=myfile)
# view the file
cat(readLines(myfile), sep="\n")
# read the structure file into genotypes and populations
testdata <- read.Structure(myfile, ploidy=8)
# examine the results
testdata