seq_cluster {bioseq} | R Documentation |
Cluster sequences by similarity
Description
Cluster sequences by similarity
Usage
seq_cluster(x, threshold = 0.05, method = "complete")
Arguments
x |
a DNA, RNA or AA vector of sequences to clustered. |
threshold |
Threshold value (range in [0, 1]). |
method |
the clustering method (see details). |
Details
The function uses ape dist.dna
and
dist.aa
functions to compute pairwise distances among sequences and
hclust
for clustering.
Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.
Supported methods are:
-
"single"
(= Nearest Neighbour Clustering) -
"complete"
(= Farthest Neighbour Clustering) -
"average"
(= UPGMA) -
"mcquitty"
(= WPGMA)
Value
An integer vector with group memberships.
See Also
Function seq_consensus
to compute consensus
and representative sequences for clusters.
Other aggregation operations:
seq_consensus()
Examples
x <- c("-----TACGCAGTAAAAGCTACTGATG",
"CGTCATACGCAGTAAAAACTACTGATG",
"CTTCATACGCAGTAAAAACTACTGATG",
"CTTCATATGCAGTAAAAACTACTGATG",
"CTTCATACGCAGTAAAAACTACTGATG",
"CGTCATACGCAGTAAAAGCTACTGATG",
"CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)
[Package bioseq version 0.1.4 Index]