amCluster {allelematch} | R Documentation |
Clustering of multilocus genotypes
Description
Performs clustering of multilocus genotypes to identify unique consensus and singleton genotypes
and generates analysis output in formatted text, HTML, or CSV. These functions are usually
called by amUnique
. This interface remains to enable a better understanding of how
amUnique
operates. For more information see example.
There are three steps to this analysis: (1) identify the dissimilarity between pairs of genotypes using a metric which takes missing data into account, (2) cluster this dissimilarity matrix using a standard hierarchical agglomerative clustering approach, and (3) use a dynamic tree cutting approach to identify clusters.
Usage
amCluster(
amDatasetFocal,
runUntilSingletons = TRUE,
cutHeight = 0.3,
missingMethod = 2,
consensusMethod = 1,
clusterMethod = "complete"
)
amHTML.amCluster(
x,
htmlFile = NULL,
htmlCSS = amCSSForHTML()
)
amCSV.amCluster(
x,
csvFile
)
## S3 method for class 'amCluster'
summary(
object,
html = NULL,
csv = NULL,
...
)
Arguments
amDatasetFocal |
An |
runUntilSingletons |
When |
cutHeight |
Sets the tree cutting height using the hybrid method in the |
missingMethod |
The method used to determine the similarity of multilocus genotypes when data is missing. |
consensusMethod |
The method (an integer) used to determine the consensus multilocus genotype from a cluster
of multilocus genotypes. |
clusterMethod |
The method used by |
object , x |
An |
htmlFile |
HTML filepath to create. |
htmlCSS |
String containing a valid cascading style sheet. |
html |
If |
csvFile , csv |
CSV filepath to create containing only the unique genotypes determined in the clustering. |
... |
Additional arguments to |
Details
Selecting an appropriate cutHeight
parameter (also known as the d-hat criterion) is
essential. Typically this function is called from amUnique
, and the conversion between
alleleMismatch (m-hat) and cutHeight (d-hat) will be done automatically. Selecting an
appropriate value for alleleMismatch (m-hat) can be done using amUniqueProfile
. See the
supplementary documentation for an explanation of how these parameters are related.
runUntilSingletons=TRUE
provides an efficient and reliable way to determine the unique
individuals in a dataset if the dataset meets certain criteria. To understand how the clustering
is thinning the dataset run this recursion manually using runUntilSingletons=FALSE
. An
example is provided below.
cutHeight
in practice gives the amount of dissimilarity (using the metric described in
amMatrix
) required for two multilocus genotypes to be declared different (also
known as d-hat). The default setting for consensusMethod
performs well.
consensusMethod |
|
1 | Genotype with max similarity to others in the cluster is consensus (DEFAULT) |
2 | Genotype with max similarity to others in the cluster is consensus then interpolate missing alleles using mode non-missing allele in each column |
3 | Genotype with min missing data is consensus |
4 | Genotype with min missing data is consensus then interpolate missing alleles using mode non-missing allele in each column |
Value
amCluster
object or side effects: analysis summary written to an HTML file or to the
console, or written to a CSV file.
Note
There is an additional side effect of html = TRUE
(or of htmlFile = NULL
). If
required, there is a clean up of the operating system temporary directory where AlleleMatch
temporary HTML files are stored. Files that match the pattern am*.html and are older than 24
hours are deleted from this temporary directory.
Author(s)
Paul Galpern (pgalpern@gmail.com)
References
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
See Also
amDataset
, amMatrix
, amPairwise
,
amUnique
, amUniqueProfile
Examples
## Not run:
data("amExample5")
## Produce amDataset object
myDataset <-
amDataset(
amExample5,
missingCode = "-99",
indexColumn = 1,
metaDataColumn = 2,
ignoreColumn = "gender"
)
## Usage
myCluster <-
amCluster(
myDataset,
cutHeight = 0.2
)
## Display analysis as HTML in default browser
summary.amCluster(
myCluster,
html = TRUE
)
## Save analysis to HTML file
summary.amCluster(
myCluster,
html = "myCluster.htm"
)
## Display analysis as formatted text on the console
summary.amCluster(myCluster)
## Save unique genotypes only to a CSV file
summary.amCluster(
myCluster,
csv = "myCluster.csv"
)
## Demonstration of how amCluster operates
## Manual control over the recursion in amCluster()
summary.amCluster(
myCluster1 <-
amCluster(
myDataset,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
summary.amCluster(
myCluster2 <-
amCluster(
myCluster1$unique,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
summary.amCluster(
myCluster3 <-
amCluster(
myCluster2$unique,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
summary.amCluster(
myCluster4 <-
amCluster(
myCluster3$unique,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
## No more clusters, therefore stop.
## End(Not run)