impute {funspace}R Documentation

Imputing Trait Information

Description

Imputing incomplete trait information, with the possibility of using phylogenetic information

Usage

impute(
  traits,
  phylo = NULL,
  addingSpecies = FALSE,
  nEigen = 10,
  messages = TRUE
)

Arguments

traits

A matrix or data.frame containing trait information with missing values. The rows correspond to observations (generally species) and the columns to the variables (generally traits). Traits can be continuous and/or categorical. Row names of the traits object must contain the names of the species. We recommend writing species name in the format "Genus_species" or "Genus species".

phylo

(optional) A phylogenetic tree (an object of class "phylo") containing the evolutionary relationships between species. phylo is used to estimate phylogenetic eigenvectors that are added to the traits matrix. Not all species in traits need to be necessarily included in phylo, despite this is highly recommended. Note that in order to assign phylogenetic information to species reliably, the names in phylo$tip.label must be exactly the same as row.names(traits), although not necessarily in the same order. Note that computing cophenetic distances for very large trees (ca. 30,000 species) can result in memory allocation problems.

addingSpecies

Logical, defaults to FALSE. Should species present in the trait matrix but not in the phylogeny be added to the phylogeny? If TRUE, the phytools::add.species.to.genus function is used to add species to the root of the genus (in case there are any other congeneric species in the tree). Note that phytools::add.species.to.genus has other arguments that provide more flexibility, but those are not considered here for simplicity; users who want to make use of those options can instead modify their phylogenetic tree beforehand.

nEigen

The number of phylogenetic eigenvectors to be considered. Defaults to 10.

messages

Logical, defaults to TRUE. Should the function return messages?

Details

impute imputes trait values in trait matrices with incomplete trait information. It uses the Random Forest approach implemented in the missForest package. Phylogenetic information can be incorporated in the imputation in the form of a phylogenetic tree, from which a number of phylogenetic eigenvectors are added to the trait matrix.

Value

The function returns a list containing both the original trait data (incomplete) and the imputed trait data.

Examples


# GSPFF_missing dataset includes >10,000 species.
# Preparing and imputing this data takes very long time.
# Let's select a small random subset:
selectSPS <- 200
set.seed(2)
subset_traits <- GSPFF_missing[sample(1:nrow(GSPFF_missing), selectSPS), ]
deleteTips <- setdiff(phylo$tip.label, rownames(subset_traits))
subset_phylo <- ape::drop.tip(phylo, tip = deleteTips)
GSPFF_subset <- impute(traits = subset_traits, phylo = subset_phylo, addingSpecies = TRUE)
pca <- princomp(GSPFF_subset$imputed)
funtest <- funspace(pca)
plot(funtest, pnt = TRUE, pnt.cex = 0.2, arrows = TRUE)
summary(funtest)



[Package funspace version 0.2.2 Index]