R: Computes ensemble taxonomic assignments for each ASV in an...

assign.ensembleTax {ensembleTax}

R Documentation

Computes ensemble taxonomic assignments for each ASV in an amplicon data set

Description

Computes ensemble taxonomic assignments for each ASV in an amplicon data set

Usage

assign.ensembleTax(
  x,
  tablenames = names(x),
  ranknames = c("kingdom", "supergroup", "division", "class", "order", "family",
    "genus", "species"),
  weights = rep(1, length(x)),
  tiebreakz = NULL,
  count.na = TRUE,
  assign.threshold = 0
)

Arguments

`x`	A list of dataframes of type character or list (no factors) that contain an arbitrary number of meta-data columns (e.g. ASV sequences or numbers), and other columns named according to ranknames that include taxonomic assignments for each ASV in the data set
`tablenames`	A character vector of the names of each taxonomy table provided in x. Default is names(x)
`ranknames`	The names of ranks (columns) of the taxonomy tables included in x. These are used to track ASV-identifying data through the ensemble calculations.
`weights`	A numeric vector with length = length(x) that specifies relative weights to the taxonomic assignments in the corresponding element of x. Default is a vector with all elements =1 to specify equal weighting of all taxonomy tables assignments. All values must be integers.
`tiebreakz`	NULL is the default. Alternatively, a character vector containing the tablenames in order of priority to be used as a tie-breaker in the event that multiple taxonomic names are found at equal (weighted) highest frequencies (above assign.threshold).
`count.na`	TRUE or FALSE indicating whether you would like NA assignments considered in the ensemble calculation. TRUE considers NA assignments, FALSE does not consider NA assignments. assign.threshold is implemented differently depending on whether this is TRUE or FALSE.
`assign.threshold`	A number between 0 and 1 that indicates the (weighted) proportion at which a particular taxonomic name must be assigned in the input taxonomy tables in order to be assigned to the ensemble taxonomic assignment. When count.na=FALSE, proportions are calculated only relative to the number of tables with no NA assignments. When count.na=TRUE, proportions are calculated relative to the sum of the weights argument.

Details

The algorithm takes as input a list of taxonomy tables (dataframes of type character or list; no factors) and assumes rows correspond to ASVs/OTUs and columns correspond to taxonomic assignments at ranks listed in descending order in the input ranknames. All taxonomy tables should follow the same taxonomic nomenclature (naming and ranking conventions), should include ASV/OTU-identifying columns (e.g. ASV sequences or a column of asv numbers, etc), and each row of each taxonomy table should represent the same ASV/OTU. Use of the functions bayestax2df, idtax2df, and/or taxmapper will ensure your taxonomy tables meet these requirements. Be advised that rownames of each taxonomy table are set to NULL by assign.ensembleTax.

Ensemble taxonomic assignments are computed by finding the highest-frequency taxonomic assignment for each ASV across all input taxonomy tables. Several parameters can be controlled by the user to weight the assignments of specific taxonomy tables more highly than others (weights), to favor assignments by a specific table in the event that multiple assignments are found at the same (weighted) highest frequency (tiebreakz), to set a (weighted) frequency threshold above which a taxonomic assignment must be found to be assigned in the ensemble (assign.threshold), and finally to ignore non-assignments signalled by NA in the frequency and assignment computations (count.na).

The output is a dataframe of ASVs and corresponding ensemble taxonomic assignments.

Value

a dataframe containing ensemble taxonomic assignments

Author(s)

Dylan Catlett

Kevin Son

Examples

fake1.pr2 <- data.frame(ASV = c("AAAA", "ATCG", "GCGC", "TATA", "TCGA"),
         kingdom = c("Eukaryota", "Eukaryota", "Eukaryota", "Eukaryota",
         "Eukaryota"),
         supergroup = c(NA, "Stramenopiles", "Rhizaria", "Stramenopiles",
         "Alveolata"),
         division = c(NA, "Ochrophyta", "Radiolaria", "Opalozoa",
         "Dinoflagellata"),
         class = c(NA, "Bacillariophyta", "Polycystinea", "MAST-12",
         "Syndiniales"),
         order = c(NA, "Bacillariophyta_X", "Collodaria", "MAST-12A", NA),
         family = c(NA, "Polar-centric-Mediophyceae", "Collophidiidae", NA,
         NA),
         genus = c(NA, NA, "Collophidium", NA, NA),
         species = as.character(c(NA, NA, NA, NA, NA)),
         stringsAsFactors = FALSE)
fake2.pr2 <- data.frame(ASV = c("AAAA", "ATCG", "GCGC", "TATA", "TCGA"),
         kingdom = c("Eukaryota", "Eukaryota", "Eukaryota", "Eukaryota",
         "Eukaryota"),
         supergroup = c(NA, "Stramenopiles", "Rhizaria", "Stramenopiles",
         "Alveolata"),
         division = c(NA, "Opalozoa", "Radiolaria", "Opalozoa",
         "Dinoflagellata"),
         class = c(NA, NA, "Polycystinea", NA, "Dinophycese"),
         order = c(NA, NA, "Collodaria", NA, NA),
         family = c(NA, NA, "Collophidiidae", NA, NA),
         genus = c(NA, NA, "Collophidium", NA, NA),
         species = as.character(c(NA, NA, NA, NA, NA)),
         stringsAsFactors = FALSE)
head(fake1.pr2)
head(fake2.pr2)
xx <- list(fake1.pr2, fake2.pr2)
names(xx) <- c("fake1", "fake2")
xx
eTax <- assign.ensembleTax(xx,
           tablenames = names(xx),
           ranknames = c("kingdom", "supergroup", "division","class","order",
           "family","genus","species"),
           tiebreakz = NULL,
           count.na=TRUE,
           assign.threshold = 0.5,
           weights=rep(1,length(xx)))
head(eTax)
eTax <- assign.ensembleTax(xx,
                    tablenames = names(xx),
                    ranknames = c("kingdom", "supergroup", "division",
                    "class","order","family","genus","species"),
                    tiebreakz = NULL,
                    count.na=FALSE,
                    assign.threshold = 0.5,
                    weights=c(2,1))
head(eTax)

[Package ensembleTax version 1.1.1 Index]

Computes ensemble taxonomic assignments for each ASV in an amplicon data set

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples