R: Convenient ANOVA estimation for factorial designs

aov_car {afex}

R Documentation

Convenient ANOVA estimation for factorial designs

Description

These functions allow convenient specification of any type of ANOVAs (i.e., purely within-subjects ANOVAs, purely between-subjects ANOVAs, and mixed between-within or split-plot ANOVAs) for data in the long format (i.e., one observation per row). If the data has more than one observation per individual and cell of the design (e.g., multiple responses per condition), the data will be automatically aggregated. The default settings reproduce results from commercial statistical packages such as SPSS or SAS. aov_ez is called specifying the factors as character vectors, aov_car is called using a formula similar to aov specifying an error strata for the within-subject factor(s), and aov_4 is called with a lme4-like formula (all ANOVA functions return identical results). The returned object can be passed to e.g., emmeans for further analysis (e.g., follow-up tests, contrasts, plotting, etc.). These functions employ Anova (from the car package) to provide test of effects avoiding the somewhat unhandy format of car::Anova.

Usage

aov_car(
  formula,
  data,
  fun_aggregate = NULL,
  type = afex_options("type"),
  factorize = afex_options("factorize"),
  observed = NULL,
  anova_table = list(),
  include_aov = afex_options("include_aov"),
  return = afex_options("return_aov"),
  ...
)

aov_4(
  formula,
  data,
  observed = NULL,
  fun_aggregate = NULL,
  type = afex_options("type"),
  factorize = afex_options("factorize"),
  return = afex_options("return_aov"),
  anova_table = list(),
  include_aov = afex_options("include_aov"),
  ...,
  print.formula = FALSE
)

aov_ez(
  id,
  dv,
  data,
  between = NULL,
  within = NULL,
  covariate = NULL,
  observed = NULL,
  fun_aggregate = NULL,
  transformation,
  type = afex_options("type"),
  factorize = afex_options("factorize"),
  return = afex_options("return_aov"),
  anova_table = list(),
  include_aov = afex_options("include_aov"),
  ...,
  print.formula = FALSE
)

Arguments

`formula`	A formula specifying the ANOVA model similar to `aov` (for `aov_car` or similar to `lme4:lmer` for `aov_4`). Must include an error term (i.e., `Error(id/...)` for `aov_car` or `(...\|id)` for `aov_4`). Note that the within-subject factors do not need to be outside the Error term (this contrasts with `aov`). See Details.
`data`	A `data.frame` containing the data. Mandatory.
`fun_aggregate`	The function for aggregating the data before running the ANOVA if there is more than one observation per individual and cell of the design. The default `NULL` issues a warning if aggregation is necessary and uses `mean`. Pass `mean` directly to avoid the warning.
`type`	The type of sums of squares for the ANOVA. The default is given by `afex_options("type")`, which is initially set to 3. Passed to `Anova`. Possible values are `"II"`, `"III"`, `2`, or `3`.
`factorize`	logical. Should between subject factors be factorized (with note) before running the analysis. The default is given by `afex_options("factorize")`, which is initially `TRUE`. If one wants to run an ANCOVA, this needs to be set to `FALSE` (in which case centering on 0 is checked on numeric variables).
`observed`	`character` vector indicating which of the variables are observed (i.e, measured) as compared to experimentally manipulated. The default effect size reported (generalized eta-squared) requires correct specification of the observed (in contrast to manipulated) variables.
`anova_table`	`list` of further arguments passed to function producing the ANOVA table. Arguments such as `es` (effect size) or `correction` are passed to either `anova.afex_aov` or `nice`. Note that those settings can also be changed once an object of class `afex_aov` is created by invoking the `anova` method directly.
`include_aov`	Boolean. Allows suppressing the calculation of the aov object. If TRUE the aov model is part of the returned `afex_aov` object. `FALSE` (the default) prevents this potentially costly calculation. Especially for designs with larger N and within-subjects factors, this is highly advisable. Follow-up analyses using emmeans using the `univariate` model (which is not recommended) require the aov model and TRUE.
`return`	What should be returned? The default is given by `afex_options("return_aov")`, which is initially `"afex_aov"`, returning an S3 object of class `afex_aov` for which various methods exist (see there and below for more details). Other values are currently still supported for backward compatibility.
`...`	Further arguments passed to `fun_aggregate`.
`print.formula`	`aov_ez` and `aov_4` are wrapper for `aov_car`. This boolean argument indicates whether the formula in the call to `car.aov` should be printed.
`id`	`character` vector (of length 1) indicating the subject identifier column in `data`.
`dv`	`character` vector (of length 1) indicating the column containing the dependent variable in `data`.
`between`	`character` vector indicating the between-subject(s) factor(s)/column(s) in `data`. Default is `NULL` indicating no between-subjects factors.
`within`	`character` vector indicating the within-subject(s)(or repeated-measures) factor(s)/column(s) in `data`. Default is `NULL` indicating no within-subjects factors.
`covariate`	`character` vector indicating the between-subject(s) covariate(s) (i.e., column(s)) in `data`. Default is `NULL` indicating no covariates. Please note that `factorize` needs to be set to `FALSE` in case the covariate is numeric and should be treated as such.
`transformation`	In `aov_ez`, a `character` vector (of length 1) indicating the name of a transformation to apply to `dv` before fitting the model. If missing, no transformation is applied. In `aov_car` and `aov_4`, a response transformation may be incorporated in the left-hand side of `formula`.

Details

Details of ANOVA Specification

aov_ez will concatenate all between-subject factors using * (i.e., producing all main effects and interactions) and all covariates by + (i.e., adding only the main effects to the existing between-subject factors). The within-subject factors do fully interact with all between-subject factors and covariates. This is essentially identical to the behavior of SPSS's glm function.

The formulas for aov_car or aov_4 must contain a single Error term specifying the ID column and potential within-subject factors (you can use mixed for running mixed-effects models with multiple error terms). Factors outside the Error term are treated as between-subject factors (the within-subject factors specified in the Error term are ignored outside the Error term; in other words, it is not necessary to specify them outside the Error term, see Examples).
Suppressing the intercept (i.e, via 0 + or - 1) is ignored. Specific specifications of effects (e.g., excluding terms with - or using ^) could be okay but is not tested. Using the I or poly function within the formula is not tested and not supported!

To run an ANCOVA you need to set factorize = FALSE and make sure that all variables have the correct type (i.e., factors are factors and numeric variables are numeric and centered).

Note that the default behavior is to include calculation of the effect size generalized eta-squared for which all non-manipluated (i.e., observed) variables need to be specified via the observed argument to obtain correct results. When changing the effect size to "pes" (partial eta-squared) or "none" via anova_table this becomes unnecessary.

Factor contrasts will be set to "contr.sum" for all between-subject factors if default contrasts are not equal to "contr.sum" or attrib(factor, "contrasts") != "contr.sum". (within-subject factors are hard-coded "contr.sum".)

Statistical Issues

Type 3 sums of squares are default in afex. While some authors argue that so-called type 3 sums of squares are dangerous and/or problematic (most notably Venables, 2000), they are the default in many commercial statistical application such as SPSS or SAS. Furthermore, statisticians with an applied perspective recommend type 3 tests (e.g., Maxwell and Delaney, 2004). Consequently, they are the default for the ANOVA functions described here. For some more discussion on this issue see here.

Note that lower order effects (e.g., main effects) in type 3 ANOVAs are only meaningful with effects coding. Therefore, contrasts are set to contr.sum which ensures meaningful results. For a discussion of the other (non-recommended) coding schemes see here.

Follow-Up Contrasts and Post-Hoc Tests

The S3 object returned per default can be directly passed to emmeans::emmeans for further analysis. This allows to test any type of contrasts that might be of interest independent of whether or not this contrast involves between-subject variables, within-subject variables, or a combination thereof. The general procedure to run those contrasts is the following (see Examples for a full example):

Estimate an afex_aov object with the function returned here. For example: x <- aov_car(dv ~ a*b + (id/c), d)
Obtain a emmGrid-class object by running emmeans on the afex_aov object from step 1 using the factors involved in the contrast. For example: r <- emmeans(x, ~a:c)
Create a list containing the desired contrasts on the reference grid object from step 2. For example: con1 <- list(a_x = c(-1, 1, 0, 0, 0, 0), b_x = c(0, 0, -0.5, -0.5, 0, 1))
Test the contrast on the reference grid using contrast. For example: contrast(r, con1)
To control for multiple testing p-value adjustments can be specified. For example the Bonferroni-Holm correction: contrast(r, con1, adjust = "holm")

Note that emmeans allows for a variety of advanced settings and simplifications, for example: all pairwise comparison of a single factor using one command (e.g., emmeans(x, "a", contr = "pairwise")) or advanced control for multiple testing by passing objects to multcomp. A comprehensive overview of the functionality is provided in the accompanying vignettes (see here).

Since version 1.0, afex per default uses the multivariate model (i.e., the lm slot of the afex_aov object) for follow-up tests with emmeans. Compared to the univariate model (i.e., the aov slot), this can handle unbalanced data and addresses sphericity better. To use the older (and not recommended) model = "univariate" make sure to set include_aov = TRUE when estimating the ANOVA.

Starting with afex version 0.22, emmeans is not loaded/attached automatically when loading afex. Therefore, emmeans now needs to be loaded by the user via library("emmeans") or require("emmeans").

Methods for `afex_aov` Objects

A full overview over the methods provided for afex_aov objects is provided in the corresponding help page: afex_aov-methods. The probably most important ones for end-users are summary, anova, and nice.

The summary method returns, for ANOVAs containing within-subject (repeated-measures) factors with more than two levels, the complete univariate analysis: Results without df-correction, the Greenhouse-Geisser corrected results, the Hyunh-Feldt corrected results, and the results of the Mauchly test for sphericity.

The anova method returns a data.frame of class "anova" containing the ANOVA table in numeric form (i.e., the one in slot anova_table of a afex_aov). This method has arguments such as correction and es and can be used to obtain an ANOVA table with different correction than the one initially specified.

The nice method also returns a data.frame, but rounds most values and transforms them into characters for nice printing. Also has arguments like correction and es which can be used to obtain an ANOVA table with different correction than the one initially specified.

Value

aov_car, aov_4, and aov_ez are wrappers for Anova and aov, the return value is dependent on the return argument. Per default, an S3 object of class "afex_aov" is returned containing the following slots:

"anova_table": An ANOVA table of class c("anova", "data.frame").
"aov": aov object returned from aov (should not be used to evaluate significance of effects, but can be passed to emmeans for post-hoc tests).
"Anova": object returned from Anova, an object of class "Anova.mlm" (if within-subjects factors are present) or of class c("anova", "data.frame").
"lm": the object fitted with lm and passed to Anova (i.e., an object of class "lm" or "mlm"). Also returned if return = "lm".
"data": a list containing: (1) long (the possibly aggregated data in long format used for aov), wide (the data used to fit the lm object), and idata (if within-subject factors are present, the idata argument passed to car::Anova). Also returned if return = "data".

In addition, the object has the following attributes: "dv", "id", "within", "between", and "type".

The print method for afex_aov objects (invisibly) returns (and prints) the same as if return is "nice": a nice ANOVA table (produced by nice) with the following columns: Effect, df, MSE (mean-squared errors), F (potentially with significant symbols), ges (generalized eta-squared), p.

Functions

aov_4(): Allows definition of ANOVA-model using lme4::lmer-like Syntax (but still fits a standard ANOVA).
aov_ez(): Allows definition of ANOVA-model using character strings.

Note

Calculation of ANOVA models via aov (which is done per default) can be comparatively slow and produce comparatively large objects for ANOVAs with many within-subjects factors or levels. To avoid this calculation set include_aov = FALSE. You can also disable this globally with: afex_options(include_aov = FALSE)

The id variable and variables entered as within-subjects (i.e., repeated-measures) factors are silently converted to factors. Levels of within-subject factors are converted to valid variable names using make.names(...,unique=TRUE). Unused factor levels are silently dropped on all variables.

Contrasts attached to a factor as an attribute are probably not preserved and not supported.

The workhorse is aov_car. aov_4 and aov_ez only construe and pass an appropriate formula to aov_car. Use print.formula = TRUE to view this formula.

In contrast to aov aov_car assumes that all factors to the right of / in the Error term are belonging together. Consequently, Error(id/(a*b)) and Error(id/a*b) are identical (which is not true for aov).

Author(s)

Henrik Singmann

The design of these functions was influenced by ezANOVA from package ez.

References

Cramer, A. O. J., van Ravenzwaaij, D., Matzke, D., Steingroever, H., Wetzels, R., Grasman, R. P. P. P., ... Wagenmakers, E.-J. (2015). Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review, 1-8. doi:10.3758/s13423-015-0913-5

Maxwell, S. E., & Delaney, H. D. (2004). Designing Experiments and Analyzing Data: A Model-Comparisons Perspective. Mahwah, N.J.: Lawrence Erlbaum Associates.

Venables, W.N. (2000). Exegeses on linear models. Paper presented to the S-Plus User's Conference, Washington DC, 8-9 October 1998, Washington, DC. Available from: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

Examples


##########################
## 1: Specifying ANOVAs ##
##########################

# Example using a purely within-subjects design 
# (Maxwell & Delaney, 2004, Chapter 12, Table 12.5, p. 578):
data(md_12.1)
aov_ez("id", "rt", md_12.1, within = c("angle", "noise"), 
       anova_table=list(correction = "none", es = "none"))

# Default output
aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))       


# examples using obk.long (see ?obk.long), a long version of the OBrienKaiser
# dataset (car package). Data is a split-plot or mixed design: contains both
# within- and between-subjects factors.
data(obk.long, package = "afex")

# estimate mixed ANOVA on the full design:
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), 
        data = obk.long, observed = "gender")

aov_4(value ~ treatment * gender + (phase*hour|id), 
        data = obk.long, observed = "gender")

aov_ez("id", "value", obk.long, between = c("treatment", "gender"), 
        within = c("phase", "hour"), observed = "gender")

# the three calls return the same ANOVA table:
# Anova Table (Type 3 tests)
# 
# Response: value
#                         Effect          df   MSE         F  ges p.value
# 1                    treatment       2, 10 22.81    3.94 + .198    .055
# 2                       gender       1, 10 22.81    3.66 + .115    .085
# 3             treatment:gender       2, 10 22.81      2.86 .179    .104
# 4                        phase 1.60, 15.99  5.02 16.13 *** .151   <.001
# 5              treatment:phase 3.20, 15.99  5.02    4.85 * .097    .013
# 6                 gender:phase 1.60, 15.99  5.02      0.28 .003    .709
# 7       treatment:gender:phase 3.20, 15.99  5.02      0.64 .014    .612
# 8                         hour 1.84, 18.41  3.39 16.69 *** .125   <.001
# 9               treatment:hour 3.68, 18.41  3.39      0.09 .002    .979
# 10                 gender:hour 1.84, 18.41  3.39      0.45 .004    .628
# 11       treatment:gender:hour 3.68, 18.41  3.39      0.62 .011    .641
# 12                  phase:hour 3.60, 35.96  2.67      1.18 .015    .335
# 13        treatment:phase:hour 7.19, 35.96  2.67      0.35 .009    .930
# 14           gender:phase:hour 3.60, 35.96  2.67      0.93 .012    .449
# 15 treatment:gender:phase:hour 7.19, 35.96  2.67      0.74 .019    .646
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
# 
# Sphericity correction method: GG 

# "numeric" variables are per default converted to factors (as long as factorize
# = TRUE):
obk.long$hour2 <- as.numeric(as.character(obk.long$hour))

# gives same results as calls before
aov_car(value ~ treatment * gender + Error(id/phase*hour2), 
        data = obk.long, observed = c("gender"))


# ANCOVA: adding a covariate (necessary to set factorize = FALSE)
aov_car(value ~ treatment * gender + age + Error(id/(phase*hour)), 
        data = obk.long, observed = c("gender", "age"), factorize = FALSE)

aov_4(value ~ treatment * gender + age + (phase*hour|id), 
        data = obk.long, observed = c("gender", "age"), factorize = FALSE)

aov_ez("id", "value", obk.long, between = c("treatment", "gender"), 
        within = c("phase", "hour"), covariate = "age", 
        observed = c("gender", "age"), factorize = FALSE)


# aggregating over one within-subjects factor (phase), with warning:
aov_car(value ~ treatment * gender + Error(id/hour), data = obk.long, 
        observed = "gender")

aov_ez("id", "value", obk.long, c("treatment", "gender"), "hour", 
       observed = "gender")

# aggregating over both within-subjects factors (again with warning),
# only between-subjects factors:
aov_car(value ~ treatment * gender + Error(id), data = obk.long, 
        observed = c("gender"))
aov_4(value ~ treatment * gender + (1|id), data = obk.long, 
      observed = c("gender"))
aov_ez("id", "value", obk.long, between = c("treatment", "gender"), 
       observed = "gender")

# only within-subject factors (ignoring between-subjects factors)
aov_car(value ~ Error(id/(phase*hour)), data = obk.long)
aov_4(value ~ (phase*hour|id), data = obk.long)
aov_ez("id", "value", obk.long, within = c("phase", "hour"))

### changing defaults of ANOVA table:

# no df-correction & partial eta-squared:
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), 
        data = obk.long, anova_table = list(correction = "none", es = "pes"))

# no df-correction and no MSE
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), 
        data = obk.long,observed = "gender", 
        anova_table = list(correction = "none", MSE = FALSE))

# add p-value adjustment for all effects (see Cramer et al., 2015, PB&R)
aov_ez("id", "value", obk.long, between = "treatment", 
       within = c("phase", "hour"), 
       anova_table = list(p_adjust_method = "holm"))


###########################
## 2: Follow-up Analysis ##
###########################

# use data as above
data(obk.long, package = "afex")

# 1. obtain afex_aov object:
a1 <- aov_ez("id", "value", obk.long, between = c("treatment", "gender"), 
        within = c("phase", "hour"), observed = "gender")


if (requireNamespace("ggplot2") & requireNamespace("emmeans")) {
# 1b. plot data using afex_plot function, for more see: 
## vignette("afex_plot_introduction", package = "afex")

## default plot uses multivariate model-based CIs
afex_plot(a1, "hour", "gender", c("treatment", "phase"))

  
a1b <- aov_ez("id", "value", obk.long, between = c("treatment", "gender"), 
        within = c("phase", "hour"), observed = "gender", 
        include_aov = TRUE)
## you can use a univariate model and CIs if you refit the model with the aov
## slot
afex_plot(a1b, "hour", "gender", c("treatment", "phase"), 
          emmeans_arg = list(model = "univariate"))

## in a mixed between-within designs, no error-bars might be preferrable:
afex_plot(a1, "hour", "gender", c("treatment", "phase"), error = "none")
}

if (requireNamespace("emmeans")) {
library("emmeans")  # package emmeans needs to be attached for follow-up tests.

# 2. obtain reference grid object (default uses multivariate model):
r1 <- emmeans(a1, ~treatment +phase)
r1

# 3. create list of contrasts on the reference grid:
c1 <- list(
  A_B_pre = c(rep(0, 6), 0, -1, 1),  # A versus B for pretest
  A_B_comb = c(-0.5, 0.5, 0, -0.5, 0.5, 0, 0, 0, 0), # A vs. B for post and follow-up combined
  effect_post = c(0, 0, 0, -1, 0.5, 0.5, 0, 0, 0), # control versus A&B post
  effect_fup = c(-1, 0.5, 0.5, 0, 0, 0, 0, 0, 0), # control versus A&B follow-up
  effect_comb = c(-0.5, 0.25, 0.25, -0.5, 0.25, 0.25, 0, 0, 0) # control versus A&B combined
)

# 4. test contrasts on reference grid:
contrast(r1, c1)

# same as before, but using Bonferroni-Holm correction for multiple testing:
contrast(r1, c1, adjust = "holm")

# 2. (alternative): all pairwise comparisons of treatment:
emmeans(a1, "treatment", contr = "pairwise")
}

#######################
## 3: Other examples ##
#######################
data(obk.long, package = "afex")

# replicating ?Anova using aov_car:
obk_anova <- aov_car(value ~ treatment * gender + Error(id/(phase*hour)), 
        data = obk.long, type = 2)
# in contrast to aov you do not need the within-subject factors outside Error()

str(obk_anova, 1, give.attr = FALSE)
# List of 5
#  $ anova_table:Classes ‘anova’ and 'data.frame':	15 obs. of  6 variables:
#  $ aov        : NULL
#  $ Anova      :List of 14
#  $ lm         :List of 13
#  $ data       :List of 3

obk_anova$Anova
# Type II Repeated Measures MANOVA Tests: Pillai test statistic
#                             Df test stat approx F num Df den Df    Pr(>F)    
# (Intercept)                  1   0.96954   318.34      1     10 6.532e-09 ***
# treatment                    2   0.48092     4.63      2     10 0.0376868 *  
# gender                       1   0.20356     2.56      1     10 0.1409735    
# treatment:gender             2   0.36350     2.86      2     10 0.1044692    
# phase                        1   0.85052    25.61      2      9 0.0001930 ***
# treatment:phase              2   0.68518     2.61      4     20 0.0667354 .  
# gender:phase                 1   0.04314     0.20      2      9 0.8199968    
# treatment:gender:phase       2   0.31060     0.92      4     20 0.4721498    
# hour                         1   0.93468    25.04      4      7 0.0003043 ***
# treatment:hour               2   0.30144     0.35      8     16 0.9295212    
# gender:hour                  1   0.29274     0.72      4      7 0.6023742    
# treatment:gender:hour        2   0.57022     0.80      8     16 0.6131884    
# phase:hour                   1   0.54958     0.46      8      3 0.8324517    
# treatment:phase:hour         2   0.66367     0.25     16      8 0.9914415    
# gender:phase:hour            1   0.69505     0.85      8      3 0.6202076    
# treatment:gender:phase:hour  2   0.79277     0.33     16      8 0.9723693    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[Package afex version 1.3-1 Index]