amDataset {allelematch} | R Documentation |
Prepare a dataset for use with allelematch
Description
Given an input matrix or data.frame
produce a amDataset object suitable for use with
other allelematch functions.
Usage
amDataset(
multilocusDataset,
missingCode = "-99",
indexColumn = NULL,
metaDataColumn = NULL,
ignoreColumn = NULL
)
## S3 method for class 'amDataset'
print(x, ...)
Arguments
multilocusDataset |
A |
missingCode |
A character string giving the code used for missing data. |
indexColumn |
Optional. |
metaDataColumn |
Optional. |
ignoreColumn |
Optional. |
x |
An amDataset object. |
... |
Additional arguments to summary. |
Details
Examine amExampleData
for an example of a typical input dataset in the diploid
case. (Typically these files will be the CSV output from allele calling software). Sample index
or ID information and sample meta-data may be specified in two additional columns. Columns can
optionally be given names, and these are carried through analyses. If column names are not
given, appropriate names are produced.
Each datum is treated as a character string in allelematch
functions, enabling the mixing
of numeric and alphanumeric data.
The multilocus dataset can contain any number of diploid or haploid markers, and these can be in
any order. Thus in the diploid case there should be two columns for each locus (named, say,
locus1a and locus1b). Please note that AlleleMatch
functions pay no attention to
genetics. In other words, each column is considered a comparable state. Thus matching and
clustering of multilocus genotypes is done on the basis of superficial similarity of the data
matrix rows, rather than on any appreciation of the allelic states at each locus. See
amPairwise
for more discussion.
For this reason it is important when working with diploid data to ensure that identical
individuals will have identical alleles in each column. This can be achieved by sorting each
locus so that in each case the lower length allele appears in, say, a column "locus1a" and the
higher in column "locus1b." This pattern is likely the default in allele calling software and
sorting will typically not be required unless data are derived from an unusual source.
Only one meta-data column is possible with allelematch
. If multiple columns must be
associated with a given sample for downstream analyses, try pasting them together into one
string with an appropriate separator, and separating them later when allelematch analyses are
concluded.
Value
An amDataset
object.
Author(s)
Paul Galpern (pgalpern@gmail.com)
References
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
See Also
amPairwise
, amUnique
, amExampleData
Examples
## Not run:
data("amExample5")
## Typical usage
myDataset <-
amDataset(
amExample5,
missingCode = "-99",
indexColumn = 1,
metaDataColumn = 2,
ignoreColumn = "gender"
)
## Access elements of amDataset object
myMetaData <- myDataset$metaData
mySamplingID <- myDataset$index
myAlleles <- myDataset$multilocus
## View the structure of amDataset object
unclass(myDataset)
## End(Not run)