structure2conStruct {conStruct} | R Documentation |
Convert a dataset from STRUCTURE to conStruct format
Description
structure2conStruct
converts a STRUCTURE dataset
to conStruct format
Usage
structure2conStruct(
infile,
onerowperind,
start.loci,
start.samples = 1,
missing.datum,
outfile
)
Arguments
infile |
The name and path of the file in STRUCTURE format
to be converted to |
onerowperind |
Indicates whether the file format has
one row per individual ( |
start.loci |
The index of the first column in the dataset that contains genotype data. |
start.samples |
The index of the first row in the dataset that contains genotype data (e.g., after any headers). Default value is 1. |
missing.datum |
The character or value used to denote missing data in the STRUCTURE dataset (often 0 or -9). |
outfile |
The name and path of the file containing the
|
Details
This function takes a population genetics dataset in STRUCTURE format and converts it to conStruct format. The STRUCTURE file can have one row per individual and two columns per locus, or one column and two rows per individual. It can only contain bi-allelic SNPs. Missing data is acceptable, but must be indicated with a single value throughout the dataset.
This function takes a STRUCTURE format data file and
converts it to a conStruct
format data file.
This function can only be applied to diploid organisms.
The STRUCTURE data file must be a plain text file.
If there is extraneous text or column headers before the data
starts, those extra lines should be deleted by hand or
taken into account via the start.samples
argument.
The STRUCTURE dataset can either be in the ONEROWPERIND=1 file format, with one row per individual and two columns per locus, or the ONEROWPERIND=0 format, with two rows and one column per individual. The first column of the STRUCTURE dataset should be individual names. There may be any number of other columns that contain non-genotype information before the first column that contains genotype data, but there can be no extraneous columns at the end of the dataset, after the genotype data.
The genotype data must be bi-allelic
single nucleotide polymorphisms (SNPs). Applying this function
to datasets with more than two alleles per locus may result in
cryptic failure. For more details, see the format-data
vignette.
Value
This function returns an allele frequency data matrix
that can be used as the freqs
argument in a conStruct
analysis run using conStruct
. It also saves
this object as an .RData file so that it can be used in
future analyses.