| seq_cluster {bioseq} | R Documentation |
Cluster sequences by similarity
Description
Cluster sequences by similarity
Usage
seq_cluster(x, threshold = 0.05, method = "complete")
Arguments
x |
a DNA, RNA or AA vector of sequences to clustered. |
threshold |
Threshold value (range in [0, 1]). |
method |
the clustering method (see details). |
Details
The function uses ape dist.dna and
dist.aa
functions to compute pairwise distances among sequences and
hclust for clustering.
Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.
Supported methods are:
-
"single"(= Nearest Neighbour Clustering) -
"complete"(= Farthest Neighbour Clustering) -
"average"(= UPGMA) -
"mcquitty"(= WPGMA)
Value
An integer vector with group memberships.
See Also
Function seq_consensus to compute consensus
and representative sequences for clusters.
Other aggregation operations:
seq_consensus()
Examples
x <- c("-----TACGCAGTAAAAGCTACTGATG",
"CGTCATACGCAGTAAAAACTACTGATG",
"CTTCATACGCAGTAAAAACTACTGATG",
"CTTCATATGCAGTAAAAACTACTGATG",
"CTTCATACGCAGTAAAAACTACTGATG",
"CGTCATACGCAGTAAAAGCTACTGATG",
"CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)
[Package bioseq version 0.1.4 Index]