lm.rrpp {RRPP} | R Documentation |
Linear Model Evaluation with a Randomized Residual Permutation Procedure
Description
Function performs a linear model fit over many random permutations of data, using a randomized residual permutation procedure.
Usage
lm.rrpp(
f1,
iter = 999,
turbo = FALSE,
seed = NULL,
int.first = FALSE,
RRPP = TRUE,
full.resid = FALSE,
block = NULL,
SS.type = c("I", "II", "III"),
data = NULL,
Cov = NULL,
print.progress = FALSE,
Parallel = FALSE,
verbose = FALSE,
...
)
Arguments
f1 |
A formula for the linear model (e.g., y~x1+x2). Can also
be a linear model fit
from |
iter |
Number of iterations for significance testing |
turbo |
A logical value that if TRUE, suppresses coefficient estimation
in every random permutation. This will affect subsequent analyses that
require random coefficients (see |
seed |
An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users. |
int.first |
A logical value to indicate if interactions of first main effects should precede subsequent main effects |
RRPP |
A logical value indicating whether residual randomization should be used for significance testing |
full.resid |
A logical value for whether to use the full model residuals, only (sensu ter Braak, 1992). This only works if RRPP = TRUE and SS.type = III. Rather than permuting reduced model residuals, this option permutes only the full model residuals in every random permutation of RRPP. |
block |
An optional factor for blocks within which to restrict resampling permutations. |
SS.type |
A choice between type I (sequential), type II (hierarchical), or type III (marginal) sums of squares and cross-products computations. |
data |
A data frame for the function environment, see
|
Cov |
An optional argument for including a covariance matrix to address the non-independence of error in the estimation of coefficients (via GLS). If included, any weights are ignored. |
print.progress |
A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses. |
Parallel |
Either a logical value to indicate whether parallel processing
should be used, a numeric value to indicate the number of cores to use, or a predefined
socket cluster. This argument defines parallel processing via the |
verbose |
A logical value to indicate if all possible output from an analysis should be retained. Generally this should be FALSE, unless one wishes to extract, e.g., all possible terms, model matrices, QR decomposition, or random permutation schemes. |
... |
Arguments typically used in |
Details
The function fits a linear model using ordinary least squares (OLS) or generalized least squares (GLS) estimation of coefficients over any number of random permutations of the data. A permutation procedure that randomizes vectors of residuals is employed. This procedure can randomize two types of residuals: residuals from null models or residuals from an intercept model. The latter is the same as randomizing full values, and is referred to as as a full randomization permutation procedure (FRPP); the former uses the residuals from null models, which are defined by the type of sums of squares and cross-products (SSCP) sought in an analysis of variance (ANOVA), and is referred to as a randomized residual permutation procedure (RRPP). Types I, II, and III SSCPs are supported.
Users define the SSCP type, the permutation procedure type, whether a covariance matrix is included (GLS estimation), and a few arguments related to computations. Results comprise observed linear model results (coefficients, fitted values, residuals, etc.), random sums of squares (SS) across permutation iterations, and other parameters for performing ANOVA and other hypothesis tests, using empirically-derived probability distributions.
lm.rrpp
emphasizes estimation of standard deviates of observed
statistics as effect sizes
from distributions of random outcomes. When performing ANOVA, using
the anova
function,
the effect type (statistic choice) can be varied. See
anova.lm.rrpp
for more details. Please
recognize that the type of SS must be chosen prior to running
lm.rrpp
and not when applying anova
to the lm.rrpp
fit, as design matrices for the linear model
must be created first. Therefore, SS.type
is an argument for lm.rrpp
and effect.type is an argument for
anova.lm.rrpp
. If MANOVA
statistics are preferred, eigenvalues can be added with
manova.update
and statistics summarized with
summary.manova.lm.rrpp
. See manova.update
for examples.
The coef.lm.rrpp
function can be used to test the specific
coefficients of an lm.rrpp fit. The test
statistics are the distances (d), which are also standardized (Z-scores).
The Z-scores might be easier to compare,
as the expected values for random distances can vary among coefficient
vectors (Adams and Collyer 2016).
ANOVA vs. MANOVA
Two SSCP matrices are calculated for each linear model effect, for every random permutation: R (Residuals or Random effects) and H, the difference between SSCPs for "full" and "reduced" models. (Full models contain and reduced models lack the effect tested; SSCPs are hypothesized to be the same under a null hypothesis, if there is no effect. The difference, H, would have a trace of 0 if the null hypothesis were true.) In RRPP, ANOVA and MANOVA correspond to two different ways to calculate statistics from R and H matrices.
ANOVA statistics are those that find the trace of R and H SSCP matrices before calculating subsequent statistics, including sums of squares (SS), mean squares (MS), and F-values. These statistics can be calculated with univariate data and provide univariate-like statistics for multivariate data. These statistics are dispersion measures only (covariances among variables do not contribute) and are the same as "distance-based" stats proposed by Goodall (1991) and Anderson (2001). MANOVA stats require multivariate data and are implicitly affected by variable covariances. For MANOVA, the inverse of R times H (invR.H) is first calculated for each effect, then eigenanalysis is performed on these matrix products. Multivariate statistics are calculated from the positive, real eigenvalues. In general, inferential conclusions will be similar with either approach, but effect sizes might differ.
ANOVA tables are generated by anova.lm.rrpp
on lm.rrpp
fits and MANOVA tables are generated
by summary.manova.lm.rrpp
, after running manova.update
on lm.rrpp fits.
Currently, mixed model effects are only possible with $ANOVA statistics, not $MANOVA.
More detail is found in the vignette, ANOVA versus MANOVA.
Notes for RRPP 0.5.0 and subsequent versions
The output from lm.rrpp has changed, compared to previous versions.
First, the $LM
component of output no longer includes both OLS and GLS statistics,
when GLS fits are
performed. Only GLS statistics (coefficients, residuals, fitted values)
are provided
and noted with a "gls." tag. GLS statistics can include those calculated
when weights are input (similar to the lm
argument).
Unlike previous
versions, GLS and weighted LS statistics are not labeled differently,
as weighted
LS is one form of generalized LS estimation. Second, a new object,
$Models, is included
in output, which contains the linear model fits (lm
attributes ) for
all reduced and full models that are possible to estimate fits.
Notes for RRPP 0.3.1 and subsequent versions
F-values via RRPP are calculated with residual SS (RSS) found uniquely for any model terms, as per Anderson and ter Braak (2003). This method uses the random pseudo-data generated by each term's null (reduced) model, meaning RSS can vary across terms. Previous versions used an intercept-only model for generating random pseudo-data. This generally has appropriate type I error rates but can have elevated type I error rates if the observed RSS is small relative to total SS. Allowing term by term unique RSS alleviates this concern.
Value
An object of class lm.rrpp
is a list containing the
following
call |
The matched call. |
LM |
Linear Model objects, including data (Y), coefficients, design matrix (X), sample size (n), number of dependent variables (p), dimension of data space (p.prime), QR decomposition of the design matrix, fitted values, residuals, weights, offset, model terms, data (model) frame, random coefficients (through permutations), random vector distances for coefficients (through permutations), whether OLS or GLS was performed, and the mean for OLS and/or GLS methods. Note that the data returned resemble a model frame rather than a data frame; i.e., it contains the values used in analysis, which might have been transformed according to the formula. The response variables are always labeled Y.1, Y.2, ..., in this frame. |
ANOVA |
Analysis of variance objects, including the SS type, random SS outcomes, random MS outcomes, random R-squared outcomes, random F outcomes, random Cohen's f-squared outcomes, P-values based on random F outcomes, effect sizes for random outcomes, sample size (n), number of variables (p), and degrees of freedom for model terms (df). These objects are used to construct ANOVA tables. |
PermInfo |
Permutation procedure information, including the number of permutations (perms), The method of residual randomization (perm.method), and each permutation's sampling frame (perm.schedule), which is a list of reordered sequences of 1:n, for how residuals were randomized. |
Models |
Reduced and full model fits for every possible model combination, based on terms of the entire model, plus the method of SS estimation. |
Author(s)
Michael Collyer
References
Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32-46.
Anderson MJ. and C.J.F. ter Braak. 2003. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation 73: 85-113.
Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. 115:357-365.
Adams, D.C. and M.L. Collyer. 2016. On the comparison of the strength of morphological integration across morphometric datasets. Evolution. 70:2623-2631.
Adams, D.C and M.L. Collyer. 2018. Multivariate phylogenetic anova: group-clade aggregation, biological challenges, and a refined permutation procedure. Evolution. 72:1204-1215.
ter Braak, C.J.F. 1992. Permutation versus bootstrap significance tests in
multiple regression and ANOVA. pp .79–86 In Bootstrapping and Related Techniques. eds K-H. Jockel,
G. Rothe & W. Sendler.Springer-Verlag, Berlin.
lm
for more on linear model fits.
See Also
procD.lm
and procD.pgls
within geomorph
;
Examples
## Not run:
# Examples use geometric morphometric data
# See the package, geomorph, for details about obtaining such data
data("PupfishHeads")
names(PupfishHeads)
# Head Size Analysis (Univariate)-------------------------------------------------------
fit <- lm.rrpp(log(headSize) ~ sex + locality/year, SS.type = "I",
data = PupfishHeads, print.progress = FALSE, iter = 999)
summary(fit)
anova(fit, effect.type = "F") # Maybe not most appropriate
anova(fit, effect.type = "Rsq") # Change effect type, but still not
# most appropriate
# Mixed-model approach (most appropriate, as year sampled is a random
# effect:
anova(fit, effect.type = "F", error = c("Residuals", "locality:year",
"Residuals"))
# Change to Type III SS
fit <- lm.rrpp(log(headSize) ~ sex + locality/year, SS.type = "III",
data = PupfishHeads, print.progress = FALSE, iter = 999,
verbose = TRUE)
summary(fit)
anova(fit, effect.type = "F", error = c("Residuals", "locality:year",
"Residuals"))
# Coefficients Test
coef(fit, test = TRUE)
# Predictions (holding alternative effects constant)
sizeDF <- data.frame(sex = c("Female", "Male"))
rownames(sizeDF) <- c("Female", "Male")
sizePreds <- predict(fit, sizeDF)
summary(sizePreds)
plot(sizePreds)
# Diagnostics plots of residuals
plot(fit)
# Body Shape Analysis (Multivariate) -----------
data(Pupfish)
names(Pupfish)
# Note:
dim(Pupfish$coords) # highly multivariate!
fit <- lm.rrpp(coords ~ log(CS) + Sex*Pop, SS.type = "I",
data = Pupfish, print.progress = FALSE, iter = 999,
verbose = TRUE)
summary(fit, formula = FALSE)
anova(fit)
coef(fit, test = TRUE)
# Predictions (holding alternative effects constant)
shapeDF <- expand.grid(Sex = levels(Pupfish$Sex),
Pop = levels(Pupfish$Pop))
rownames(shapeDF) <- paste(shapeDF$Sex, shapeDF$Pop, sep = ".")
shapeDF
shapePreds <- predict(fit, shapeDF)
summary(shapePreds)
summary(shapePreds, PC = TRUE)
# Plot prediction
plot(shapePreds, PC = TRUE)
plot(shapePreds, PC = TRUE, ellipse = TRUE)
# Diagnostics plots of residuals
plot(fit)
# PC-plot of fitted values
groups <- interaction(Pupfish$Sex, Pupfish$Pop)
plot(fit, type = "PC", pch = 19, col = as.numeric(groups))
# Regression-like plot
plot(fit, type = "regression", reg.type = "PredLine",
predictor = log(Pupfish$CS), pch=19,
col = as.numeric(groups))
# Body Shape Analysis (Distances) ----------
D <- dist(Pupfish$coords) # inter-observation distances
length(D)
Pupfish$D <- D
fitD <- lm.rrpp(D ~ log(CS) + Sex*Pop, SS.type = "I",
data = Pupfish, print.progress = FALSE, iter = 999)
# These should be the same:
summary(fitD, formula = FALSE)
summary(fit, formula = FALSE)
# GLS Example (Univariate) -----------
data(PlethMorph)
fitOLS <- lm.rrpp(TailLength ~ SVL, data = PlethMorph,
print.progress = FALSE, iter = 999)
fitGLS <- lm.rrpp(TailLength ~ SVL, data = PlethMorph, Cov = PlethMorph$PhyCov,
print.progress = FALSE, iter = 999)
anova(fitOLS)
anova(fitGLS)
sizeDF <- data.frame(SVL = sort(PlethMorph$SVL))
# Prediction plots
# By specimen
plot(predict(fitOLS, sizeDF)) # Correlated error
plot(predict(fitGLS, sizeDF)) # Independent error
# With respect to independent variable (using abscissa)
plot(predict(fitOLS, sizeDF), abscissa = sizeDF) # Correlated error
plot(predict(fitGLS, sizeDF), abscissa = sizeDF) # Independent error
# GLS Example (Multivariate) -----------
Y <- as.matrix(cbind(PlethMorph$TailLength,
PlethMorph$HeadLength,
PlethMorph$Snout.eye,
PlethMorph$BodyWidth,
PlethMorph$Forelimb,
PlethMorph$Hindlimb))
PlethMorph$Y <- Y
fitOLSm <- lm.rrpp(Y ~ SVL, data = PlethMorph,
print.progress = FALSE, iter = 999)
fitGLSm <- lm.rrpp(Y ~ SVL, data = PlethMorph,
Cov = PlethMorph$PhyCov,
print.progress = FALSE, iter = 999)
anova(fitOLSm)
anova(fitGLSm)
# Prediction plots
# By specimen
plot(predict(fitOLSm, sizeDF)) # Correlated error
plot(predict(fitGLSm, sizeDF)) # Independent error
# With respect to independent variable (using abscissa)
plot(predict(fitOLSm, sizeDF), abscissa = sizeDF) # Correlated error
plot(predict(fitGLSm, sizeDF), abscissa = sizeDF) # Independent error
## End(Not run)