looCV {RRPP} | R Documentation |
Diagnostic cross-validation tool for ordination based on fitted values
Description
Function performs a leave-one-out cross-validation estimate of ordination scores, which is helpful for determining if apparent "group differences" in ordination plots arise merely from data dimensionality.
Usage
looCV(fit, ...)
Arguments
fit |
A |
... |
Arguments passed to |
Details
The function uses the strategy of Thioulouse et al. (2021) to perform N
ordinations for N observations, in which each of the N observations are left
out of the estimation of linear model coefficients, but the vector of data for
the left-out observation is projected on the eigenvectors of the fitted values
obtained from the leave-one-out cross-validation (jackknife) strategy.
The purpose of this diagnostic tool is to determine whether apparent "group differences"
in an ordination plot (using the function, ordinate
) are
because of high-dimensional data (number of variables exceed number of observations)
rather than real differences. An apparent group difference is common for high-dimensional
data, when variables are far greater in number than observations (Cardini et al., 2019).
However, leave-one-out cross-validation can help elucidate whether an observed visual
difference is spurious.
This function differs from the strategy of Thioulouse et al. (2021) in two important
ways. First, this function uses the linear model design from a lm.rrpp
fit, and can contain any number of independent variables, rather than a single factor
for groups. Second, after obtaining leave-one-out cross-validated scores, a Procrustes
alignment between cross-validated scores and "observed" (real) scores is performed, which
minimizes summed squared distances between the alternative ordinations. This latter
step assures comparisons are appropriate.
The type = "PC" plot from plot.lm.rrpp
has the same scores as obtained
from ordinate(Y, A = H), using the ordinate
function, where H is a hat
matrix (that can be calculated from plot.lm.rrpp
output), and Y is a matrix
of data. This function updates H for every possible case that one row of Y is left out
(meaning the rotation matrix from ordinate
is updated N times). If
the H matrix is robust in spite of dropped data and design matrix parameters, the result
will be similar to the original ordination. If apparent group differences are spurious,
H will tend to change, as will data projections.
The functions summary.looCV
and plot.looCV
are essential for
evaluating results. These support functions compare eigenvalues and
projected scores, between observed and cross-validated cases.
This function should be viewed as a diagnostic tool and not as a data transformation tool! The cross-validated scores will not retain Euclidean distances among observations. This could cause problems in analyses that substitute cross-validated scores as data.
Value
An object of class looCV
is a list containing
the following
d |
List of eigenvalues, for observed and cross-validated cases. |
scores |
List of principal component scores, for observed and cross-validated cases. |
Author(s)
Michael Collyer
References
Thioulouse, J., Renaud, S., Dufour, A. B., & Dray, S. (2021). Overcoming the Spurious Groups Problem in Between-Group PCA. Evolutionary Biology, In press.
Cardini, A., O’Higgins, P., & Rohlf, F. J. (2019). Seeing distinct groups where there are none: spurious patterns from between-group PCA. Evolutionary Biology, 46(4), 303-316.
See Also
Examples
# Example with real group differences
data(Pupfish)
fit <- lm.rrpp(coords ~ Pop*Sex, data = Pupfish, iter = 0)
CV1 <- looCV(fit)
summary(CV1)
group <- interaction(Pupfish$Pop, Pupfish$Sex)
plot(CV1, flip = 1, pch = 19, col = group)
# Example with apparent but not real group differences
n <- NROW(Pupfish$coords)
p <- NCOL(Pupfish$coords)
set.seed(1001)
Yr <- matrix(rnorm(n * p), n, p) # random noise
fit2 <-lm.rrpp(Yr ~ Pop*Sex, data = Pupfish, iter = 0)
CV2 <- looCV(fit2)
summary(CV2)
group <- interaction(Pupfish$Pop, Pupfish$Sex)
plot(CV2, pch = 19, col = group)