R: chemodiv: A package for analysing phytochemical diversity

chemodiv {chemodiv}

R Documentation

chemodiv: A package for analysing phytochemical diversity

Description

chemodiv is an R package for analysing the chemodiversity of phytochemical data. The package includes a number of functions that enables quantification and visualization of phytochemical diversity and dissimilarity for any type of phytochemical (and similar) samples, such as herbivore defence compounds, volatiles and similar. Importantly, calculations of diversity and dissimilarity can incorporate biosynthetic and/or structural properties of the phytochemical compounds, resulting in more comprehensive quantifications of diversity and dissimilarity. Functions in the R-package will work best for sets of data, commonly generated by chemical ecologists using GC-MS, LC-MS or similar, where all or most compounds in the samples have been confidently identified. See Petren et al. 2023a for a detailed description of the package, and Petren et al. 2023b for a more in-depth discussion and review of plant chemodiversity.

Details

Two datasets are needed to use the full set of analyses included in the package.

The first dataset should contain data on the relative abundance/concentration (i.e. proportion) of different compounds (columns) in different samples (rows). See the included dataset minimalSampData for a basic example. Note that all calculations of diversity, and most calculations of dissimilarity, are only performed on relative, rather than absolute, values.

The second dataset should contain, in each of three columns in a data frame, the compound name, SMILES and InChIKey IDs of all the compounds present in the first dataset. See the included dataset minimalCompData for a basic example. SMILES and InChIKey are chemical identifiers that are easily obtained for each compound by searching for it in PubChem https://pubchem.ncbi.nlm.nih.gov/. Here, a search with a common name will bring up the compound's record in the database, where the (isomeric/canonical) SMILES and InChIKey are included. Various automated tools such as the PubChem Identifier Exchange Service https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi or The Chemical Translation Service https://cts.fiehnlab.ucdavis.edu/ can also be used. The user is intentionally required to compile the chemical identifiers manually to ensure these are correct, as lists of compounds very often contain compounds wrongly named, wrongly formatted, under various synonyms etc. which prevents easy automatic translation of compound names to SMILES and InChIKey. Note that SMILES IDs might contain the character combination "\C". If SMILES are entered manually directly in R, this is interpreted as an unrecognized escape and results in an error. In this case, an extra backslash has to be added: "\\C". If the dataset is instead imported into R as a csv-file or txt-file (recommended), this is done automatically and no manual edits has to be done.

The second dataset with the chemical IDs is primarily used to construct one or more dissimilarity matrices with pairwise dissimilarities between chemical compounds, which can then be used in calculations of phytochemical diversity and dissimilarity. As noted above, to do this, the compounds in the samples have to be identified and their chemical IDs listed. If some compounds in a dataset are unknown, these can be handled in different ways decided by the user, see compDis for details. If many or all compounds are unknown, as is common for more metabolomic type datasets, phytochemical diversity and dissimilarity can still be calculated using indices that do not consider compound dissimilarities. Alternatively, other ways to calculate compound dissimilarities, not based on knowing compound identities, can be used. For example, cosine dissimilarities between tandem (MS/MS) mass spectra of metabolomic features can be calculated in the GNPS framework https://gnps.ucsd.edu (Wang et al. 2016). A dissimilarity matrix of such dissimilarities can then be used for the compDisMat argument in various functions in the package, thereby enabling comprehensive quantification of phytochemical diversity and dissimilarity also for datasets consisting of unidentified compounds.

Once the dataset with samples and the dataset with compounds are prepared, these should be imported/constructed as separate data frames in R, and all analyses in the package can then be performed, in largely the same order as they appear in the list below.

Author(s)

Hampus Petren, Tobias G. Koellner, Robert R. Junker

References

Petren H, Koellner TG, Junker RR. 2023a. Quantifying chemodiversity considering biochemical and structural properties of compounds with the R package chemodiv. New Phytologist 237: 2478-2492.

Petren H, Anaia RA, Aragam KS, Braeutigam A, Eckert S, Heinen R, Jakobs R, Ojeda L, Popp M, Sasidharan R, Schnitzler J-P, Steppuhn A, Thon F, Tschikin S, Unsicker SB, van Dam NM, Weisser WW, Wittmann MJ, Yepes S, Ziaja D, Meuller C, Junker RR. 2023b. Understanding the phytochemical diversity of plants: Quantification, variation and ecological function. bioRxiv doi: 10.1101/2023.03.23.533415.

Wang M, Carver JJ, Phelan VV, et al. 2016. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nature Biotechnology 34: 828-837.

chemodiv: A package for analysing phytochemical diversity

Description

Details

Data format checks

Compound classification and dissimilarity

Diversity calculations

Sample dissimilarities

Molecular network and properties

Chemodiversity and network plots

Shortcut function

Author(s)

References

See Also