plotSSAllo {polysat} | R Documentation |
Perform Allele Assignments across Entire Dataset
Description
processDatasetAllo
runs alleleCorrelations
on every locus in
a "genambig"
object, then runs testAlGroups
on every locus
using several user-specified parameter sets. It chooses a single best set of allele assignments
for each locus, and produces plots to help the user evaluate assignment quality.
plotSSAllo
assists the user in evaluating the quality of allele assignments by plotting
the results of K-means clustering. plotParamHeatmap
assists the user in choosing the
best parameter set for testAlGroups
for each locus.
Usage
plotSSAllo(AlCorrArray)
plotParamHeatmap(propMat, popname = "AllInd", col = grey.colors(12)[12:1], main = "")
processDatasetAllo(object, samples = Samples(object), loci = Loci(object),
n.subgen = 2, SGploidy = 2, n.start = 50, alpha = 0.05,
parameters = data.frame(tolerance = c(0.05, 0.05, 0.05, 0.05),
swap = c(TRUE, FALSE, TRUE, FALSE),
null.weight = c(0.5, 0.5, 0, 0)),
plotsfile = "alleleAssignmentPlots.pdf", usePops = FALSE, ...)
Arguments
AlCorrArray |
A two-dimensional list, where each item in the list is the output of |
propMat |
A two-dimensional array, with loci in the first dimension and parameter sets in the
second dimension, indicating the proportion of alleles that were found to be homoplasious by |
popname |
The name of the population corresponding to the data in |
col |
The color scale for representing the proportion of loci that are homoplasious or the proportion of genotypes that are missing. |
main |
A title for the plot. |
object |
A |
samples |
An optional character vector indicating which samples to include in analysis. |
loci |
An optional character vector indicating which loci to include in analysis. |
n.subgen |
The number of isoloci into which each locus should be split. Passed directly to
|
SGploidy |
The ploidy of each isolocus. Passed directly to |
n.start |
Passed directly to the |
alpha |
The significance threshold for determining whether two alleles are significantly correlated. Used
primarily for identifying potentially problematic positive correlations. Passed directly to |
parameters |
Data frame indicating parameter sets to pass to |
plotsfile |
A PDF output file name for drawing plots to help assess assignment quality. Can be |
usePops |
If |
... |
Additional parameters to pass to |
Details
plotSSAllo
produces a plot of loci by population, with the sums-of-squares ratio on the x-axis and the evenness of allele distribution
on the y-axis (see Value). Locus names are written directly on the plot. If there are multiple population names, locus names are colored
by population, and a legend is provided for colors. Loci with high-quality allele clustering are expected to be in the upper-right
quadrant of the plot. If locus names are in italics, it indicates that positive correlations were found between some alleles, indicating
population structure or scoring error that could interfere with assignment quality.
plotParamHeatmap
produces an image to indicate the proportion of alleles found to be homoplasious, or the proportion of genotypes that
could not be unambiguously recoded using allele assignments, for each locus and
parameter set for a given population (when looking at homoplasy) or merged across populations (for homoplasy or the proportion of non-recodeable
genotypes). Darker colors indicate more homoplasy or more genotypes that could not be recoded.
The word “best” indicates, for each
locus, the parameter set that found the least homoplasy or smallest number of non-recodeable genotypes.
By default, processDatasetAllo
generates a PDF file containing output from plotSSAllo
and plotParamHeatmap
,
as well as heatmaps of the $heatmap.dist
output of alleleCorrelations
for each locus and population.
Heatmaps are not plotted for loci where an allele is present in all individuals. processDatasetAllo
also
generates a list of R objects containing allele assignments under different parameters, as well as statistics for evaluating
clustering quality and choosing the optimal parameter sets, as described below.
Value
plotSSAllo
draws a plot and invisibly returns a list:
ssratio |
A two-dimensional array with loci in the first dimension and populations in the second dimension. Each value is the sums-of-squares between isoloci divided by the total sums-of-squares, as output by K-means clustering. If K-means clustering was not performed, the value is zero. |
evenness |
An array of the same dimensions as
where |
max.evenness |
The maximum possible value for |
min.evenness |
The minimum possible value for |
posCor |
An array of the same dimensions as |
processDatasetAllo
returns a list:
AlCorrArray |
A two-dimensional list with loci in the first dimension and populations in the second
dimension, giving the results of |
TAGarray |
A three-dimensional list with loci in the first dimension, populations in the second dimension,
and parameter sets in the third dimension, giving the results of |
plotSS |
The output of |
propHomoplasious |
A three-dimensional array, with the same dimensions as |
mergedAssignments |
A two-dimensional list, with loci in the first dimension and parameter sets in the
second dimension, containing allele assignments merged across populations. This is the output of
|
propHomoplMerged |
A two-dimensional array, of the same dimensions as |
missRate |
A matrix with the same dimensions as |
bestAssign |
A one-dimensional list with a single best set of allele assignments, from |
plotParamHeatmap
draws a plot and does not return anything.
Author(s)
Lindsay V. Clark
References
Clark, L. V. and Drauch Schreier, A. (2017) Resolving microsatellite genotype ambiguity in populations of allopolyploid and diploidized autopolyploid organisms using negative correlations between allelic variables. Molecular Ecology Resources, 17, 1090–1103. DOI: 10.1111/1755-0998.12639.
See Also
alleleCorrelations
, recodeAllopoly
Examples
# get example dataset
data(AllopolyTutorialData)
# data cleanup
mydata <- deleteSamples(AllopolyTutorialData, c("301", "302", "303"))
PopInfo(mydata) <- rep(1:2, each = 150)
Genotype(mydata, 43, 2) <- Missing(mydata)
# allele assignments
# R is set to 10 here to speed processing for example. It should typically be left at the default.
myassign <- processDatasetAllo(mydata, loci = c("Loc3", "Loc6"),
plotsfile = NULL, usePops = TRUE, R = 10,
parameters = data.frame(tolerance = c(0.5, 0.5),
swap = c(TRUE, FALSE),
null.weight = c(0.5, 0.5)))
# view best assignments for each locus
myassign$bestAssign
# plot K-means results
plotSSAllo(myassign$AlCorrArray)
# plot proportion of homoplasious alleles
plotParamHeatmap(myassign$propHomoplasious, "Pop1")
plotParamHeatmap(myassign$propHomoplasious, "Pop2")
plotParamHeatmap(myassign$propHomoplMerged, "Merged across populations")
# plot proportion of missing data, after recoding, for each locus and parameter set
plotParamHeatmap(myassign$missRate, main = "Missing data:")