R: Identification of unique genotypes

amUnique {allelematch}

R Documentation

Identification of unique genotypes

Description

Identifies unique genotypes and generates analysis output in formatted text, HTML, or CSV. Samples are clustered and matched based on their dissimilarity score (see amMatrix). Also calculated is the match probability, Psib, which is the probability that a sample is a sibling of a unique genotype (and therefore not a replicate sample) given the allele frequencies in a population consisting of only the unique genotypes (Wilberg & Dreher, 2004).

Usage

	amUnique(
		amDatasetFocal,
		multilocusMap = NULL,
		alleleMismatch = NULL,
		matchThreshold = NULL,
		cutHeight = NULL,
		doPsib = "missing",
		consensusMethod = 1,
		verbose = TRUE
		)

	amHTML.amUnique(
		x,
		htmlFile = NULL,
		htmlCSS = amCSSForHTML()
		)

	amCSV.amUnique(
		x,
		csvFile,
		uniqueOnly = FALSE
		)


  ## S3 method for class 'amUnique'
summary(
    object,
    html = NULL,
    csv = NULL,
    ...
    )

Arguments

`amDatasetFocal`	An `amDataset` object containing genotypes in which an unknown number of individuals are sampled multiple times
`multilocusMap`	Optional. A vector of integers or strings giving the mappings onto loci for all genotype columns in amDatasetFocal. When omitted, columns are assumed to be paired (i.e., diploid loci with alleles in adjacent columns). See details.
`alleleMismatch`	Optional. Maximum number of mismatching alleles which will be tolerated when identifying individuals; also known as m-hat parameter. If specified, then `matchThreshold` and `cutHeight` should be omitted. All three parameters are related. See details.
`matchThreshold`	Optional. Minimum dissimilarity score which constitutes a match when identifying individuals; also known as s-hat parameter. If specified, then `alleleMismatch` and `cutHeight` should be omitted; all three parameters are related. See details.
`cutHeight`	Optional. The `cutHeight` parameter used in dynamic tree cutting by `amCluster`; also known as d-hat parameter. If specified, then `alleleMismatch` and `matchThreshold` should be omitted. All three parameters are related. See details.
`doPsib`	String specifying how match probability should be calculated. See details.
`consensusMethod`	The method (an integer) used to determine the consensus multilocus genotype from a cluster of multilocus genotypes. See `amCluster` for details. Typically the default is adequate.
`verbose`	If `verbose = TRUE`, report the progress of the analysis to the console. Useful with datasets consisting of thousands of samples where progress may be slow.
`object`, `x`	An `amUnique` object.
`htmlFile`	HTML filepath to create. If `htmlFile = NULL`, a file is created in the operating system temporary directory and is then opened in the default browser.
`htmlCSS`	A string containing a valid cascading style sheet. A default style sheet is provided in `amCSSForHTML`. See `amCSSForHTML` for details of how to tweak this CSS.
`html`	If `html = TRUE`, the `summary.amUnique` method produces and loads an HTML file in the default browser. `html` can also contain a path to a file where HTML output will be written. Note that `summary.amUnique` does not produce formatted output for the console.
`csvFile`, `csv`	CSV filepath to create containing a representation of the `amUnique` analysis.
`uniqueOnly`	If `uniqueOnly = TRUE`, only the unique genotypes will be saved to a CSV, with no additional information associated with the analysis.
`...`	Additional arguments to `summary.amUnique`

Details

Only one of alleleMismatch, cutHeight, matchThreshold can be specified, as the three parameters are related.

alleleMismatch is the most intuitive way to understand how the identification of unique genotypes proceeds. For example, a setting of alleleMismatch = 4 implies that up to four alleles may be different for multiple samples to be representatives of the same individual. In practice, however, this value is only an approximation of the amount of mismatch that may be tolerated. This is because the clustering process used to identify unique genotypes, and the subsequent matching which identifies samples that match these unique genotypes is based on a dissimilarity metric or score (see amMatrix) that incorporates both allele mismatches and missing data. alleleMismatch is not used in analyses and is converted to this dissimilarity metric in the following manner: cutHeight which is parameter for amCluster and called from this function is cutHeight = alleleMismatch/(number of allele columns) and matchThreshold which is a parameter for amPairwise and also called from this function is matchThreshold = 1 - cutHeight.

Selecting the appropriate value for alleleMismatch, cutHeight, or matchThreshold is an important task. Use amUniqueProfile to assist in this process. Seethe Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>

doPsib = "missing" is the default and specifies that match probability Psib should be calculated for samples that match unique genotypes and have no allele mismatches, but may differ by having missing data. doPsib = "all" specifies that Psib should be calculated for all samples that match unique genotypes. In this case, if allele mismatches occur, alleles are assumed to be missing at the mismatching loci.

multilocusMap is often not required, as amDataset objects will typically consist of paired columns of genotypes, where each pair is a separate locus. In cases where this is not the case (e.g., gender is in only one column), a map vector must be specified.

Example: amDataset consists of gender followed by 4 diploid loci in paired columns
multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5)
or equally
multilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3", "LOC4", "LOC4")

For more information on selecting consensusMethod see amCluster. The default consensusMethod = 1 is typically adequate.

Value

amUnique object or side effects: analysis summary written to an HTML file or to the console, or written to a CSV file.

Note

There is an additional side effect of html = TRUE (or of htmlFile = NULL). If required, there is a clean up of the operating system temporary directory where AlleleMatch temporary HTML files are stored. Files that match the pattern am*.html and are older 24 hours are deleted from this temporary directory.

Author(s)

Paul Galpern (pgalpern@gmail.com)

References

For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.

Wilberg MJ, Dreher BP (2004) GENECAP: a program for analysis of multilocus genotype data for non-invasive sampling and capture-recapture population estimation. Molecular Ecology Notes, 4, 783-785.

Examples

	## Not run: 
	data("amExample2")

	## Produce amDataset object
	myDataset <-
		amDataset(
			amExample2,
			missingCode = "-99",
			indexColumn = 1,
			ignoreColumn = 2
			)

	## Usage
	## Optimal alleleMismatch parameter previously found using amUniqueProfile()
	myUnique <-
		amUnique(
		myDataset,
		alleleMismatch = 3
		)

	## Display analysis as HTML in default browser
	summary.amUnique(
		myUnique,
		html = TRUE
		)

	## Save analysis to HTML file
	summary.amUnique(
		myUnique,
		html = "myUnique.htm"
		)

	## Save analysis to a CSV file
	summary.amUnique(
		myUnique,
		csv = "myUnique.csv"
		)

	## Save unique genotypes only to a CSV file
	summary.amUnique(
		myUnique,
		csv = "myUnique.csv",
		uniqueOnly = TRUE
		)

	## Data set with gender information
	data("amExample5")

	## Produce amDataset object
	myDataset2 <-
		amDataset(
			amExample5,
			missingCode = "-99",
			indexColumn = 1,
			metaDataColumn = 2
			)

	## Usage
	## Optimal alleleMismatch parameter previously found using amUniqueProfile()
	myUniqueProfile <-
		amUnique(
			myDataset2,
			multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,
			11, 11),
			alleleMismatch = 3
			)

	
## End(Not run)

[Package allelematch version 2.5.4 Index]