R: resolve_duplicates

resolve_duplicates {fossilbrush}

R Documentation

resolve_duplicates

Description

Function for identifying and resolving alternative higher assignments in a hierarchically structured dataframe. Columns are checked from the lowest to the highest rank for elements with multiple higher assignments. These assignments are then assessed topologically to determine if they represent inadvertent use of the same name at a given rank for genuinely different entities, or whether the higher classifications are conflicting. In the case of the former, unique character suffixes are applied to each differently classified case (up to 26 currently supported), effectively splitting up the alternatively classified element. In the case of the latter, the alternative classifications are assessed and are either combined, or the more frequently used or the more complete classification scheme is taken (the more frequent pathway can also be the most complete).

Usage

resolve_duplicates(x, ranks = NULL, jump = 4, plot = FALSE, verbose = TRUE)

Arguments

`x`	A dataframe containing hierarchically structured information, for example a table of genus names and their higher taxonomic classifications
`ranks`	If not NULL, a vector of column names of x, given in rank order. This is useful if x contains columns which are not rank relevant or if columns are not in hierarchical order. If not supplied, the column order in x is used directly and is assumed to be in rank order
`jump`	The maximum number of levels between the point of divergence and the point of reunion (if present) for a given path, below which the divergence will be taken as conflicting
`plot`	A logical speciying if the divergent paths should be plotted
`verbose`	A logical of length one which determines if the function should report the detection and resolution of elements with multiple higher classifications (if any)

Value

The dataframe x, with any alternative higher classifications resolved, giving the classification a strict tree structure

Examples

# load dataset
data("brachios")
# define ranks
b_ranks <- c("phylum", "class", "order", "family", "genus")
# run function
res <- resolve_duplicates(brachios, ranks = b_ranks)

[Package fossilbrush version 1.0.5 Index]