specificity_scores {POMS} | R Documentation |
Compute shrunken specificity score of a feature, which represents how the presence of a feature is associated with a given sample grouping.
Description
This code replicates the environmental specificity score introduced in phylogenize. The code here is modified from the phylogenize code base (https://bitbucket.org/pbradz/phylogenize/src/master/package/phylogenize/R/; commit 6f1bdba9c5a9ff04e90a8ad77bcee8ec9281730d).
Usage
specificity_scores(
abun_table,
meta_table,
focal_var_level,
var_colname,
sample_colname,
silence_citation = FALSE
)
Arguments
abun_table |
abundance table to use for computing specificity Features must be rows and samples columns. All values greater than 0 will be interpreted as present. |
meta_table |
dataframe object containing metadata for all samples. Must include at least one column corresponding to the sample ids and one column containing the metadata of interest that will be focused on. |
focal_var_level |
length-one character vector specifying the variable value to restrict inferences of prevalence to. In other words, prevalence will be computed based on the sample set that contain this value of the variable of interest in the metadata table. |
var_colname |
length-one character vector specifying the name of column in the metadata table that contains the metadata of interest (e.g., where focal_var_level can be found). |
sample_colname |
length-one character vector specifying the name of column in the metadata table that contains the sample ids. |
silence_citation |
length-one Boolean vector specifying whether to silence message notifying user about phylogenize package and paper. |
Details
This algorithm is descibed in detail in Bradley et al. 2018. Phylogeny-corrected identification of microbial gene families relevant to human gut colonization. PLOS Computational Biology.
Note thee can be some random fluctuations between re-runs of this function. The differences are usually minor, but users are strongly suggested to set a random seed before use to ensure their workflow is reproducible.
Value
Numeric vector with the specificity score for each input feature (i.e., for each row of abun_table).