haplotype {pegas} | R Documentation |
Haplotype Extraction and Frequencies
Description
haplotype
extracts the haplotypes from a set of DNA
sequences. The result can be plotted with the appropriate function.
Usage
haplotype(x, ...)
## S3 method for class 'DNAbin'
haplotype(x, labels = NULL, strict = FALSE,
trailingGapsAsN = TRUE, ...)
## S3 method for class 'character'
haplotype(x, labels = NULL, ...)
## S3 method for class 'numeric'
haplotype(x, labels = NULL, ...)
## S3 method for class 'haplotype'
plot(x, xlab = "Haplotype", ylab = "Number", ...)
## S3 method for class 'haplotype'
print(x, ...)
## S3 method for class 'haplotype'
summary(object, ...)
## S3 method for class 'haplotype'
sort(x,
decreasing = ifelse(what == "frequencies", TRUE, FALSE),
what = "frequencies", ...)
## S3 method for class 'haplotype'
x[...]
Arguments
x |
a set of DNA sequences (as an object of class
|
object |
an object of class |
labels |
a vector of character strings used as names for the rows of the returned object. By default, Roman numerals are given. |
strict |
a logical value; if |
trailingGapsAsN |
a logical value; if |
xlab , ylab |
labels for the x- and x-axes. |
... |
further arguments passed to
|
decreasing |
a logical value specifying in which order to sort
the haplotypes; by default this depends on the value of
|
what |
a character specifying on what feature the haplotypes
should be sorted: this must be |
Details
The way ambiguities in the sequences are taken into account is explained in a post to r-sig-phylo (see the examples below):
https://www.mail-archive.com/r-sig-phylo@r-project.org/msg05541.html
The sort
method sorts the haplotypes in decreasing frequencies
(the default) or in alphabetical order of their labels (if what =
"labels"
). Note that if these labels are Roman numerals (as assigned by
haplotype
), their alphabetical order may not be their numerical
one (e.g., IX is alphabetically before VIII).
From pegas 0.7, haplotype
extracts haplotypes taking into
account base ambiguities (see Note below).
Value
haplotype
returns an object of class c("haplotype",
"DNAbin")
which is an object of class "DNAbin"
with two
additional attributes: "index"
identifying the index of each
observation that share the same haplotype, and "from"
giving
the name of the original data.
sort
returns an object of the same class respecting its
attributes.
Note
The presence of ambiguous bases and/or alignment gaps in DNA sequences
can make the interpretation of haplotypes difficult. It is recommended
to check their distributions with image.DNAbin
and
base.freq
(using the options in both functions).
Comparing the results obtained playing with the options strict
and trailingGapsAsN
of haplotype.DNAbin
may be useful.
Note that the ape function seg.sites
has the
same two options (as from ape 5.4) which may be useful to find the
relevant sites in the sequence alignment.
Note
There are cases where the algorithm that pools the different sequences into haplotypes has difficulties, although it seems to require a specific configuration of missing/ambiguous data. The last example below is one of them.
Author(s)
Emmanuel Paradis
See Also
haploNet
, haploFreq
,
subset.haplotype
,
DNAbin
for manipulation of DNA sequences in R.
The haplotype
method for objects of class "loci"
is
documented separately: haplotype.loci
.
Examples
## generate some artificial data from 'woodmouse':
data(woodmouse)
x <- woodmouse[sample(15, size = 110, replace = TRUE), ]
(h <- haplotype(x))
## the indices of the individuals belonging to the 1st haplotype:
attr(h, "index")[[1]]
plot(sort(h))
## get the frequencies in a named vector:
setNames(lengths(attr(h, "index")), labels(h))
## data posted by Hirra Farooq on r-sig-phylo (see link above):
cat(">[A]\nCCCGATTTTATATCAACATTTATTT------",
">[D]\nCCCGATTTT----------------------",
">[B]\nCCCGATTTTATATCAACATTTATTT------",
">[C]\nCCCGATTTTATATCACCATTTATTTTGATTT",
file = "x.fas", sep = "\n")
x <- read.dna("x.fas", "f")
unlink("x.fas")
## show the sequences and the distances:
alview(x)
dist.dna(x, "N", p = TRUE)
## by default there are 3 haplotypes with a warning about ambiguity:
haplotype(x)
## the same 3 haplotypes without warning:
haplotype(x, strict = TRUE)
## if we remove the last sequence there is, by default, a single haplotype:
haplotype(x[-4, ])
## to get two haplotypes separately as with the complete data:
haplotype(x[-4, ], strict = TRUE)
## a simpler example:
y <- as.DNAbin(matrix(c("A", "A", "A", "A", "R", "-"), 3))
haplotype(y) # 1 haplotype
haplotype(y, strict = TRUE) # 3 haplotypes
haplotype(y, trailingGapsAsN = FALSE) # 2 haplotypes
## a tricky example with 4 sequences and 1 site:
z <- as.DNAbin(matrix(c("Y", "A", "R", "N"), 4))
alview(z, showpos = FALSE)
## a single haplotype is identified:
haplotype(z)
## 'Y' has zero-distance with (and only with) 'N', so they are pooled
## together; at a later iteration of this pooling step, 'N' has
## zero-distance with 'R' (and ultimately with 'A') so they are pooled
## if the sequences are ordered differently, 'Y' and 'A' are separated:
haplotype(z[c(4, 1:3), ])