cor_phylo {phyr} | R Documentation |
Correlations among multiple variates with phylogenetic signal
Description
This function calculates Pearson correlation coefficients for multiple continuous
variates that may have phylogenetic signal, allowing users to specify measurement
error as the standard error of variate values at the tips of the phylogenetic tree.
Phylogenetic signal for each variate is estimated from the data assuming that variate
evolution is given by a Ornstein-Uhlenbeck process. Thus, the function allows the
estimation of phylogenetic signal in multiple variates while incorporating
correlations among variates. It is also possible to include independent variables
(covariates) for each variate to remove possible confounding effects.
cor_phylo
returns the correlation matrix for variate values, estimates
of phylogenetic signal for each variate, and regression coefficients for
independent variables affecting each variate.
Usage
cor_phylo(variates, species, phy,
covariates = NULL,
meas_errors = NULL,
data = sys.frame(sys.parent()),
REML = TRUE,
method = c("nelder-mead-r", "bobyqa",
"subplex", "nelder-mead-nlopt", "sann"),
no_corr = FALSE,
constrain_d = FALSE,
lower_d = 1e-7,
rel_tol = 1e-6,
max_iter = 1000,
sann_options = NULL,
verbose = FALSE,
rcond_threshold = 1e-10,
boot = 0,
keep_boots = c("fail", "none", "all"))
## S3 method for class 'cor_phylo'
boot_ci(mod, refits = NULL, alpha = 0.05, ...)
## S3 method for class 'cor_phylo'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
variates |
A formula or a matrix specifying variates between which correlations
are being calculated.
The formula should be one-sided of the form |
species |
A one-sided formula implicating the variable inside |
phy |
Either a phylogeny of class |
covariates |
A list specifying covariate(s) for each variate.
The list can contain only two-sided formulas or matrices.
Formulas should be of the typical form: |
meas_errors |
A list or matrix containing standard errors for each variate.
If a list, it must contain only two-sided formulas like those for |
data |
An optional data frame, list, or environment that contains the
variables in the model. By default, variables are taken from the environment
from which |
REML |
Whether REML (versus ML) should be used for model fitting.
Defaults to |
method |
Method of optimization using |
no_corr |
A single logical for whether to make all correlations zero.
Running |
constrain_d |
If |
lower_d |
Lower bound on the phylogenetic signal parameter.
Defaults to |
rel_tol |
A control parameter dictating the relative tolerance for convergence
in the optimization. Defaults to |
max_iter |
A control parameter dictating the maximum number of iterations
in the optimization. Defaults to |
sann_options |
A named list containing the control parameters for SANN
minimization.
This is only relevant if |
verbose |
If |
rcond_threshold |
Threshold for the reciprocal condition number of two
matrices inside the log likelihood function.
Increasing this threshold makes the optimization process more strongly
"bounce away" from badly conditioned matrices and can help with convergence
and with estimates that are nonsensical.
Defaults to |
boot |
Number of parametric bootstrap replicates. Defaults to |
keep_boots |
Character specifying when to output data (indices, convergence codes,
and simulated variate data) from bootstrap replicates.
This is useful for troubleshooting when one or more bootstrap replicates
fails to converge or outputs ridiculous results.
Setting this to |
mod |
|
refits |
One or more |
alpha |
Alpha used for the confidence intervals. Defaults to |
... |
arguments passed to and from other methods. |
x |
an object of class |
digits |
the number of digits to be printed. |
Value
cor_phylo
returns an object of class cor_phylo
:
call |
The matched call. |
corrs |
The |
d |
Values of |
B |
A matrix of regression-coefficient estimates, SE, Z-scores, and P-values, respectively. Rownames indicate which coefficient it refers to. |
B_cov |
Covariance matrix for regression coefficients. |
logLik |
The log likelihood for either the restricted likelihood
( |
AIC |
AIC for either the restricted likelihood ( |
BIC |
BIC for either the restricted likelihood ( |
niter |
Number of iterations the optimizer used. |
convcode |
Conversion code for the optimizer.
This number is
For more information on the nlopt return codes, see https://nlopt.readthedocs.io/en/latest/NLopt_Reference/#return-values. |
rcond_vals |
Reciprocal condition numbers for two matrices inside
the log likelihood function. These are provided to potentially help guide
the changing of the |
bootstrap |
A list of bootstrap output, which is simply |
boot_ci
returns a list of confidence intervals with the following fields:
corrs
-
Estimates of correlations. This is a matrix the values above the diagonal being the upper limits and values below being the lower limits.
d
Phylogenetic signals.
B0
Coefficient estimates.
B_cov
Coefficient covariances.
Methods (by generic)
-
boot_ci
: returns bootstrapped confidence intervals from acor_phylo
object -
print
: printscor_phylo
objects
Walkthrough
For the case of two variables, the function estimates parameters for the model of the form, for example,
X[1] = B[1,0] + B[1,1] * u[1,1] + \epsilon[1]
X[2] = B[2,0] + B[2,1] * u[2,1] + \epsilon[2]
\epsilon ~ Gaussian(0, V)
where B[1,0]
, B[1,1]
, B[2,0]
, and B[2,1]
are regression
coefficients, and V
is a variance-covariance matrix containing the correlation
coefficient r, parameters of the OU process d1
and d2
, and diagonal
matrices M1
and M2
of measurement standard errors for X[1]
and
X[2]
. The matrix V
is 2n x 2n
, with n x n
blocks given by
V[1,1] = C[1,1](d1) + M1
V[1,2] = C[1,2](d1,d2)
V[2,1] = C[2,1](d1,d2)
V[2,2] = C[2,2](d2) + M2
where C[i,j](d1,d2)
are derived from phy
under the assumption of joint
OU evolutionary processes for each variate (see Zheng et al. 2009). This formulation
extends in the obvious way to more than two variates.
Author(s)
Anthony R. Ives, Lucas A. Nell
References
Zheng, L., A. R. Ives, T. Garland, B. R. Larget, Y. Yu, and K. F. Cao. 2009. New multivariate tests for phylogenetic signal and trait correlations applied to ecophysiological phenotypes of nine Manglietia species. Functional Ecology 23:1059–1069.
Examples
#
# ## Simple example using data without correlations or phylogenetic
# ## signal. This illustrates the structure of the input data.
#
# set.seed(10)
# phy <- ape::rcoal(10, tip.label = 1:10)
# data_df <- data.frame(
# species = phy$tip.label,
# # variates:
# par1 = rnorm(10),
# par2 = rnorm(10),
# par3 = rnorm(10),
# # covariate for par2:
# cov2 = rnorm(10, mean = 10, sd = 4),
# # measurement error for par1 and par2, respectively:
# se1 = 0.2,
# se2 = 0.4
# )
# data_df$par2 <- data_df$par2 + 0.5 * data_df$cov2
#
#
# # cor_phylo(variates = ~ par1 + par2 + par3,
# # covariates = list(par2 ~ cov2),
# # meas_errors = list(par1 ~ se1, par2 ~ se2),
# # species = ~ species,
# # phy = phy,
# # data = data_df)
#
# # If you've already created matrices/lists...
# X <- as.matrix(data_df[,c("par1", "par2", "par3")])
# U <- list(par2 = cbind(cov2 = data_df$cov2))
# M <- cbind(par1 = data_df$se1, par2 = data_df$se2)
#
# # ... you can also use those directly
# # (notice that I'm inputting an object for `species`
# # bc I ommitted `data`):
# # cor_phylo(variates = X, species = data_df$species,
# # phy = phy, covariates = U,
# # meas_errors = M)
#
#
#
#
# ## Simulation example for the correlation between two variables. The example
# ## compares the estimates of the correlation coefficients from cor_phylo when
# ## measurement error is incorporated into the analyses with three other cases:
# ## (i) when measurement error is excluded, (ii) when phylogenetic signal is
# ## ignored (assuming a "star" phylogeny), and (iii) neither measurement error
# ## nor phylogenetic signal are included.
#
# # In the simulations, variable 2 is associated with a single independent variable.
#
# library(ape)
#
# set.seed(1)
# # Set up parameter values for simulating data
# n <- 50
# phy <- rcoal(n, tip.label = 1:n)
# trt_names <- paste0("par", 1:2)
#
# R <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2)
# d <- c(0.3, 0.95)
# B2 <- 1
#
# Se <- c(0.2, 1)
# M <- matrix(Se, nrow = n, ncol = 2, byrow = TRUE)
# colnames(M) <- trt_names
#
# # Set up needed matrices for the simulations
# p <- length(d)
#
# star <- stree(n)
# star$edge.length <- array(1, dim = c(n, 1))
# star$tip.label <- phy$tip.label
#
# Vphy <- vcv(phy)
# Vphy <- Vphy/max(Vphy)
# Vphy <- Vphy/exp(determinant(Vphy)$modulus[1]/n)
#
# tau <- matrix(1, nrow = n, ncol = 1) %*% diag(Vphy) - Vphy
# C <- matrix(0, nrow = p * n, ncol = p * n)
# for (i in 1:p) for (j in 1:p) {
# Cd <- (d[i]^tau * (d[j]^t(tau)) * (1 - (d[i] * d[j])^Vphy))/(1 - d[i] * d[j])
# C[(n * (i - 1) + 1):(i * n), (n * (j - 1) + 1):(j * n)] <- R[i, j] * Cd
# }
# MM <- matrix(M^2, ncol = 1)
# V <- C + diag(as.numeric(MM))
#
# # Perform a Cholesky decomposition of Vphy. This is used to generate phylogenetic
# # signal: a vector of independent normal random variables, when multiplied by the
# # transpose of the Cholesky deposition of Vphy will have covariance matrix
# # equal to Vphy.
# iD <- t(chol(V))
#
# # Perform Nrep simulations and collect the results
# Nrep <- 100
# cor.list <- matrix(0, nrow = Nrep, ncol = 1)
# cor.noM.list <- matrix(0, nrow = Nrep, ncol = 1)
# cor.noP.list <- matrix(0, nrow = Nrep, ncol = 1)
# cor.noMP.list <- matrix(0, nrow = Nrep, ncol = 1)
# d.list <- matrix(0, nrow = Nrep, ncol = 2)
# d.noM.list <- matrix(0, nrow = Nrep, ncol = 2)
# B.list <- matrix(0, nrow = Nrep, ncol = 3)
# B.noM.list <- matrix(0, nrow = Nrep, ncol = 3)
# B.noP.list <- matrix(0, nrow = Nrep, ncol = 3)
#
#
# set.seed(2)
# for (rep in 1:Nrep) {
#
# XX <- iD %*% rnorm(2 * n)
# X <- matrix(XX, n, p)
# colnames(X) <- trt_names
#
# U <- list(cbind(rnorm(n, mean = 2, sd = 10)))
# names(U) <- trt_names[2]
#
# X[,2] <- X[,2] + B2[1] * U[[1]][,1] - B2[1] * mean(U[[1]][,1])
#
# # Call cor_phylo with (i) phylogeny and measurement error,
# # (ii) just phylogeny,
# # and (iii) just measurement error
# z <- cor_phylo(variates = X,
# covariates = U,
# meas_errors = M,
# phy = phy,
# species = phy$tip.label)
# z.noM <- cor_phylo(variates = X,
# covariates = U,
# phy = phy,
# species = phy$tip.label)
# z.noP <- cor_phylo(variates = X,
# covariates = U,
# meas_errors = M,
# phy = star,
# species = phy$tip.label)
#
# cor.list[rep] <- z$corrs[1, 2]
# cor.noM.list[rep] <- z.noM$corrs[1, 2]
# cor.noP.list[rep] <- z.noP$corrs[1, 2]
# cor.noMP.list[rep] <- cor(cbind(
# lm(X[,1] ~ 1)$residuals,
# lm(X[,2] ~ U[[1]])$residuals))[1,2]
#
# d.list[rep, ] <- z$d
# d.noM.list[rep, ] <- z.noM$d
#
# B.list[rep, ] <- z$B[,1]
# B.noM.list[rep, ] <- z.noM$B[,1]
# B.noP.list[rep, ] <- z.noP$B[,1]
# }
#
# correlation <- rbind(R[1, 2], mean(cor.list), mean(cor.noM.list),
# mean(cor.noP.list), mean(cor.noMP.list))
# rownames(correlation) <- c("True", "With M and Phy", "Without M",
# "Without Phy", "Without Phy or M")
#
# signal.d <- rbind(d, colMeans(d.list), colMeans(d.noM.list))
# rownames(signal.d) <- c("True", "With M and Phy", "Without M")
#
# est.B <- rbind(c(0, 0, B2), colMeans(B.list),
# colMeans(B.noM.list[-39,]), # 39th rep didn't converge
# colMeans(B.noP.list))
# rownames(est.B) <- c("True", "With M and Phy", "Without M", "Without Phy")
# colnames(est.B) <- rownames(z$B)
#
# # Example simulation output:
#
# correlation
# # [,1]
# # True 0.7000000
# # With M and Phy 0.6943712
# # Without M 0.2974162
# # Without Phy 0.3715406
# # Without Phy or M 0.3291473
#
# signal.d
# # [,1] [,2]
# # True 0.3000000 0.9500000
# # With M and Phy 0.3025853 0.9422067
# # Without M 0.2304527 0.4180208
#
# est.B
# # par1_0 par2_0 par2_cov_1
# # True 0.000000000 0.0000000 1.0000000
# # With M and Phy -0.008838245 0.1093819 0.9995058
# # Without M -0.008240453 0.1142330 0.9995625
# # Without Phy 0.002933341 0.1096578 1.0028474