manova.update {RRPP} | R Documentation |
MANOVA update for lm.rrpp model fits
Description
Function updates a lm.rrpp fit to add $MANOVA, which like $ANOVA, provides statistics or matrices typically associated with multivariate analysis of variance (MANOVA).
MANOVA statistics or sums of squares and cross-products (SSCP) matrices
are calculated over the random permutations of a lm.rrpp
fit. SSCP matrices are
computed, as are the inverse of R times H (invR.H), where R is a SSCP
for the residuals or random effects and H is
the difference between SSCP matrices of full and reduced models
(see below). From invR.H,
MANOVA statistics are calculated, including Roy's maximum root
(eigenvalue), Pillai trace, Hotelling-Lawley trace,
and Wilks lambda (via summary.manova.lm.rrpp
).
The manova.update to add $MANOVA to lm.rrpp
fits
requires more computation time than the $ANOVA
statistics that are computed automatically in lm.rrpp
.
Generally, the same inferential conclusions will
be found with either approach, when observations outnumber response
variables. For high-dimensional data (more variables
than observations) data are projected into a Euclidean space of
appropriate dimensions (rank of residual covariance matrix).
One can vary the tolerance for eigenvalue decay or specify the number
of PCs, if a smaller set of PCs than the maximum is desired.
This is advised if there is strong correlation among variables
(the data space could be simplified to fewer dimensions), as spurious
results are possible. Because distributions of MANOVA stats
can be generated from the random permutations,
there is no need to approximate F-values, like with parametric MANOVA.
By restricting analysis to the real, positive eigenvalues calculated,
all statistics can be calculated (but Wilks lambda, as a product but
not a trace, might be less reliable as variable number approaches
the number of observations).
ANOVA vs. MANOVA
Two SSCP matrices are calculated for each linear model effect, for every random permutation: R (Residuals or Random effects) and H, the difference between SSCPs for "full" and "reduced" models. (Full models contain and reduced models lack the effect tested; SSCPs are hypothesized to be the same under a null hypothesis, if there is no effect. The difference, H, would have a trace of 0 if the null hypothesis were true.) In RRPP, ANOVA and MANOVA correspond to two different ways to calculate statistics from R and H matrices.
ANOVA statistics are those that find the trace of R and H SSCP matrices before calculating subsequent statistics, including sums of squares (SS), mean squares (MS), and F-values. These statistics can be calculated with univariate data and provide univariate-like statistics for multivariate data. These statistics are dispersion measures only (covariances among variables do not contribute) and are the same as "distance-based" stats proposed by Goodall (1991) and Anderson (2001). MANOVA stats require multivariate data and are implicitly affected by variable covariances. For MANOVA, the inverse of R times H (invR.H) is first calculated for each effect, then eigen-analysis is performed on these matrix products. Multivariate statistics are calculated from the positive, real eigenvalues. In general, inferential conclusions will be similar with either approach, but effect sizes might differ.
Two important differences between manova.update and
summary.manova
(for lm
objects)
are that manova.update
does not attempt to normalize residual SSCP matrices (unneeded
for non-parametric statistical solutions) and (2) uses a generalized
inverse of the residual SSCP, if needed, when the number of
variables could render eigen-analysis problematic. This approach
is consistent
with covariance regularization methods that attempt to make covariance
matrices positive-definite for calculating model likelihoods or
multivariate statistics. If the number of observations far exceeds
the number of response variables, observed statistics from
manova.update and
summary.manova
will be quite similar. If the
number of response variables approaches or exceeds the number
of observations, manova.update
statistics will be much more reliable.
ANOVA tables are generated by anova.lm.rrpp
on
lm.rrpp fits and MANOVA tables are generated
by summary.manova.lm.rrpp
, after running
manova.update on lm.rrpp fits.
Currently, mixed model effects are only possible with $ANOVA statistics, not $MANOVA.
More detail is found in the vignette, ANOVA versus MANOVA.
Usage
manova.update(
fit,
error = NULL,
tol = 1e-07,
PC.no = NULL,
print.progress = TRUE,
verbose = NULL
)
Arguments
fit |
Linear model fit from |
error |
An optional character string to define R matrices used to calculate invR.H. (Currently only Residuals can be used and this argument defaults to NULL. Future versions will update this argument.) |
tol |
A tolerance value for culling data dimensions to prevent spurious results. The distribution of eigenvalues for the data will be examined and if the decay becomes less than the tolerance, the data will be truncated to principal components ahead of this point. This will possibly prevent spurious results calculated from eigenvalues near 0. If tol = 0, all possible PC axes are used, which is likely not a problem if observations outnumber variables. If tol = 0 and the number of variables exceeds the number of observations, the value of tol will be made slightly positive to prevent problems with eigen-analysis. |
PC.no |
A value that, if not NULL, can override the tolerance argument, and forces a desired number of data PCs to use for analysis. If a value larger than the possible number of PCs is chosen, the full compliment of PCs (the full data space) will be used. If a number larger than tol would permit is chosen, the minimum number of PCs between the tol argument and PC.no argument is returned. |
print.progress |
A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses. |
verbose |
Either a NULL or logical value for whether to retain
all MANOVA result (if TRUE). If NULL, the verbose argument used for
the |
Value
An object of class lm.rrpp
is updated to include
class manova.lm.rrpp
, and the object,
$MANOVA, which includes
SSCP |
Terms and Model SSCP matrices. |
invR.H |
The inverse of the residuals SSCP times the H SSCP. |
eigs |
The eigenvalues of invR.H. |
e.rank |
Rank of the error (residuals) covariance matrix. Currently NULL only. |
PCA |
Principal component analysis of data, using either tol or PC.no. |
manova.pc.dims |
Resulting number of PC vectors in the analysis. |
e.rank |
Rank of the residual (error) covariance matrix, irrespective of the number of dimensions used for analysis. |
Author(s)
Michael Collyer
References
Goodall, C.R. 1991. Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society B 53:285-339.
Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32-46.
Examples
## Not run:
# Body Shape Analysis (Multivariate) ----------------
data(Pupfish)
# Although not recommended as a practice, this example will use only
# three principal components of body shape for demonstration.
# A larger number of random permutations should also be used.
Pupfish$shape <- ordinate(Pupfish$coords)$x[, 1:3]
Pupfish$logSize <- log(Pupfish$CS)
fit <- lm.rrpp(shape ~ logSize + Sex, SS.type = "I",
data = Pupfish, print.progress = FALSE, iter = 499)
summary(fit, formula = FALSE)
anova(fit) # ANOVA table
# MANOVA
fit.m <- manova.update(fit, print.progress = FALSE, tol = 0.001)
summary(fit.m, test = "Roy")
summary(fit.m, test = "Pillai")
fit.m$MANOVA$eigs$obs # observed eigenvalues
fit.m$MANOVA$SSCP$obs # observed SSCP
fit.m$MANOVA$invR.H$obs # observed invR.H
# Distributions of test statistics
summ.roy <- summary(fit.m, test = "Roy")
dens <- apply(summ.roy$rand.stats, 1, density)
par(mfcol = c(1, length(dens)))
for(i in 1:length(dens)) {
plot(dens[[i]], xlab = "Roy max root", ylab = "Density",
type = "l", main = names(dens)[[i]])
abline(v = summ.roy$rand.stats[1, i], col = "red")
}
par(mfcol = c(1,1))
## End(Not run)