R: Cluster sequences by similarity

seq_cluster {bioseq}

R Documentation

Cluster sequences by similarity

Description

Cluster sequences by similarity

Usage

seq_cluster(x, threshold = 0.05, method = "complete")

Arguments

`x`	a DNA, RNA or AA vector of sequences to clustered.
`threshold`	Threshold value (range in [0, 1]).
`method`	the clustering method (see details).

Details

The function uses ape dist.dna and dist.aa functions to compute pairwise distances among sequences and hclust for clustering.

Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.

Supported methods are:

"single" (= Nearest Neighbour Clustering)
"complete" (= Farthest Neighbour Clustering)
"average" (= UPGMA)
"mcquitty" (= WPGMA)

Value

An integer vector with group memberships.

Examples


x <- c("-----TACGCAGTAAAAGCTACTGATG",
       "CGTCATACGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CTTCATATGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CGTCATACGCAGTAAAAGCTACTGATG",
       "CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)

[Package bioseq version 0.1.4 Index]

Cluster sequences by similarity

Description

Usage

Arguments

Details

Value

See Also

Examples