asv2otu {MiscMetabar}R Documentation

Recluster sequences of an object of class physeq or a list of DNA sequences

Description

[Maturing]

Usage

asv2otu(
  physeq = NULL,
  dna_seq = NULL,
  nproc = 1,
  method = "clusterize",
  id = 0.97,
  vsearchpath = "vsearch",
  tax_adjust = 0,
  vsearch_cluster_method = "--cluster_size",
  vsearch_args = "--strand both",
  keep_temporary_files = FALSE,
  swarmpath = "swarm",
  d = 1,
  swarm_args = "--fastidious",
  method_clusterize = "overlap",
  ...
)

Arguments

physeq

(required): a phyloseq-class object obtained using the phyloseq package.

dna_seq

You may directly use a character vector of DNA sequences in place of physeq args. When physeq is set, dna sequences take the value of physeq@refseq

nproc

(default: 1) Set to number of cpus/processors to use for the clustering

method

(default: clusterize) Set the clustering method.

  • clusterize use the DECIPHER::Clusterize() fonction,

  • vsearch use the vsearch software (https://github.com/torognes/vsearch) with arguments --cluster_size by default (see args vsearch_cluster_method) and ⁠-strand both⁠ (see args vsearch_args)

  • swarm use the swarm

id

(default: 0.97) level of identity to cluster

vsearchpath

(default: vsearch) path to vsearch

tax_adjust

(Default 0) See the man page of merge_taxa_vec() for more details. To conserved the taxonomic rank of the most abundant ASV, set tax_adjust to 0 (default). For the moment only tax_adjust = 0 is robust

vsearch_cluster_method

(default: "–cluster_size) See other possible methods in the vsearch manual (e.g. --cluster_size or --cluster_smallmem)

  • --cluster_fast : Clusterize the fasta sequences in filename, automatically sort by decreasing sequence length beforehand.

  • --cluster_size : Clusterize the fasta sequences in filename, automatically sort by decreasing sequence abundance beforehand.

  • --cluster_smallmem : Clusterize the fasta sequences in filename without automatically modifying their order beforehand. Sequence are expected to be sorted by decreasing sequence length, unless –usersort is used. In that case you may set vsearch_args to vsearch_args = "–strand both –usersort"

vsearch_args

(default : "–strand both") a one length character element defining other parameters to passed on to vsearch.

keep_temporary_files

(logical, default: FALSE) Do we keep temporary files

  • temp.fasta (refseq in fasta or dna_seq sequences)

  • cluster.fasta (centroid if method = "vsearch")

  • temp.uc (clusters if method = "vsearch")

swarmpath

(default: swarm) path to swarm

d

(default: 1) maximum number of differences allowed between two amplicons, meaning that two amplicons will be grouped if they have d (or less) differences

swarm_args

(default : "–fastidious") a one length character element defining other parameters to passed on to swarm See other possible methods in the SWARM pdf manual

method_clusterize

(default "overlap") the method for the DECIPHER::Clusterize() method

...

Others arguments passed on to DECIPHER::Clusterize()

Details

This function use the merge_taxa_vec function to merge taxa into clusters. By default tax_adjust = 0. See the man page of merge_taxa_vec().

Value

A new object of class physeq or a list of cluster if dna_seq args was used.

Author(s)

Adrien Taudière

References

VSEARCH can be downloaded from https://github.com/torognes/vsearch. More information in the associated publication https://pubmed.ncbi.nlm.nih.gov/27781170.

See Also

vsearch_clustering() and swarm_clustering()

Examples

if (requireNamespace("DECIPHER")) {
  asv2otu(data_fungi_mini)
}

if (requireNamespace("DECIPHER")) {
  asv2otu(data_fungi_mini, method_clusterize = "longest")

  if (MiscMetabar::is_swarm_installed()) {
    d_swarm <- asv2otu(data_fungi_mini, method = "swarm")
  }
  if (MiscMetabar::is_vsearch_installed()) {
    d_vs <- asv2otu(data_fungi_mini, method = "vsearch")
  }
}


[Package MiscMetabar version 0.9.1 Index]