seq_cluster {bioseq}R Documentation

Cluster sequences by similarity

Description

Cluster sequences by similarity

Usage

seq_cluster(x, threshold = 0.05, method = "complete")

Arguments

x

a DNA, RNA or AA vector of sequences to clustered.

threshold

Threshold value (range in [0, 1]).

method

the clustering method (see details).

Details

The function uses ape dist.dna and dist.aa functions to compute pairwise distances among sequences and hclust for clustering.

Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.

Supported methods are:

Value

An integer vector with group memberships.

See Also

Function seq_consensus to compute consensus and representative sequences for clusters.

Other aggregation operations: seq_consensus()

Examples


x <- c("-----TACGCAGTAAAAGCTACTGATG",
       "CGTCATACGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CTTCATATGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CGTCATACGCAGTAAAAGCTACTGATG",
       "CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)


[Package bioseq version 0.1.4 Index]