R: Analysis of Variance

ANOVA {lessR}

R Documentation

Analysis of Variance

Description

Abbreviation: av, av_brief

Analysis of variance from the R aov function plus graphics and effect sizes. Included designs are one-way between groups, two-way between groups and randomized blocks with one treatment factor with one observation for each treatment and block combination.

Output is generated into distinct segments by topic, organized and displayed in sequence by default. When the output is assigned to an object, such as a in a <- reg(Y ~ X), the full or partial output can be accessed for later analysis and/or viewing. A primary such analysis is with knitr for dynamic report generation. The input instructions to knitr are written comments and interpretation with embedded R code, called R~Markdown. Generate a complete, though preliminary at this time, R Markdown document from the Rmd option ready to knit. Simply specify the option with a file name, run the ANOVA function to create the file. Then open the newly created .Rmd file in RStudio and click the knit button to create a formatted document that consists of the statistical results and interpretative comments. See the sections arguments, value and examples for more information.

Usage

ANOVA(my_formula, data=d, filter=NULL,
         brief=getOption("brief"), digits_d=NULL, 
         Rmd=NULL, jitter_x=0.4,
         res_rows=NULL, res_sort=c("zresid", "fitted", "off"),
         graphics=TRUE, pdf=FALSE, width=5, height=5,
         fun_call=NULL, ...)

av(...)

av_brief(..., brief=TRUE)

Arguments

`my_formula`	Standard R `formula` for specifying a model. Use an asterisk, `*`, separating the two factors for a two-way ANOVA, and a plus, `+`, separating the factors for a randomized blocks ANOVA with the blocking factor listed second.
`data`	The default name of the data frame that contains the data for analysis is `d`, otherwise explicitly specify.
`filter`	A logical expression that specifies a subset of rows of the data frame to analyze.
`brief`	If set to `TRUE`, reduced text output with no Tukey multiple comparison of means and no residuals. Can change system default with `style` function.
`digits_d`	For the Basic Analysis, it provides the number of decimal digits. For the rest of the output, it is a suggestion only.
`Rmd`	File name for the file of R Markdown instructions to be written, if specified. The file type is .Rmd, which automatically opens in RStudio, but it is a simple text file that can be edited with any text editor, including RStudio.
`jitter_x`	Amount of horizontal jitter for points in the scatterplot of levels and response variable for a one-way ANOVA.
`res_rows`	Default is 20, which lists the first 20 rows of data and residuals sorted by the specified sort criterion. To disable residuals, specify a value of 0. To see the residuals output for all observations, specify a value of `"all"`.
`res_sort`	Default is `"zresid"`, for specifying standardized residuals as the sort criterion for the display of the rows of data and associated residuals. Other values are `"fitted"` for the fitted values and `"off"` to not sort the rows of data.
`graphics`	Produce graphics. Default is `TRUE`. In `Rmd` can be useful to set to `FALSE` so that `regPlot` can be used to place the graphics within the output file.
`pdf`	Indicator as to if the graphic files should be saved as pdf files instead of directed to the standard graphics windows.
`width`	Width of the pdf file in inches.
`height`	Height of the pdf file in inches.
`fun_call`	Function call. Used with `Rmd` to pass the function call when obtained from the abbreviated function call `av`.
`...`	Other parameter values for R function `lm` which provides the core computations.

Details

OVERVIEW
The one-way ANOVA with Tukey HSD and corresponding plot is based on the R functions aov, TukeyHSD, and provides summary statistics for each level. Two-factor ANOVA also provides an interaction plot of the means with interaction.plot as well as a table of means and other summary statistics. The two-factor analysis can be between groups or a randomized blocked design. Residuals are displayed by default. Tukey HSD comparisons and residuals are not displayed if brief=TRUE.

The filter parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic such as & for and, | for or and ! for not, and use the standard R relational operators as described in Comparison such as == for logical equality != for not equals, and > for greater than. See the Examples.

MODEL SPECIFICATION
In the following specifications, Y is the response variable, X is a treatment variable and Blocks is the blocking variable. The distinction between the one-way randomized blocks and the two-way between groups models is not the variable names, but rather the delimiter between the variable names. Use * to indicate a two-way crossed between groups design and + for a randomized blocks design.
one-way between groups: ANOVA(Y ~ X)
one-way randomized blocks: ANOVA(Y ~ X + Blocks)
two-way between groups: ANOVA(Y ~ X1 * X2)
For more complex designs, use the standard R function aov upon which ANOVA depends.

BALANCED DESIGN
The design for the two-factor analyses must be balanced. A check is performed and processing ceases if not balanced. For unbalanced designs, consider the function lmer in the lme4 package.

DECIMAL DIGITS
The number of decimal digits displayed on the output is, by default, the maximum number of decimal digits for all the data values of the response variable. Or, this value can be explicitly specified with the digits_d parameter.

Value

The output can optionally be returned and saved into an R object, otherwise it simply appears at the console. The components of this object are redesigned in lessR version 3.3.5 into (a) pieces of text that form the readable output and (b) a variety of statistics. The readable output are character strings such as tables amenable for viewing and interpretation. The statistics are numerical values amenable for further analysis, such as to be referenced in a subsequent R Markdown document. The motivation of these two types of output is to facilitate R markdown documents, as the name of each piece, preceded by the name of the saved object followed by a $, can be inserted into the R markdown document (see examples).

TEXT OUTPUT
out_background: variables in the model, rows of data and retained
1-predictor: out_descriptive: descriptive stats
2-predictors: out_cell.n: cell sample size
2-predictors: out_cell.means: cell means
2-predictors: out_cell.marginals: marginal means
2-predictors: out_cell.gm: grand mean
2-predictors: out_cell.sd: cell standard deviations
out_anova: analysis of variance summary table
out_effects: effect sizes
out_hsd: Tukey's honestly significant different analysis
out_res: residuals
out_plots: list of plots generated if more than one

Separated from the rest of the text output are the major headings, which can then be deleted from custom collations of the output. out_title_bck: BACKGROUND
out_title_des: DESCRIPTIVE STATISTICS
out_title_basic: BASIC ANALYSIS
out_title_res: RESIDUALS

STATISTICS
call: function call that generated the analysis
formula: model formula that specifies the model
n_vars: number of variables in the model
n_obs: number of rows of data submitted for analysis
n_keep: number of rows of data retained in the analysis
1-predictor: p_value: p-value for the overall F-test residuals: residuals
fitted: fitted values

Although not typically needed for analysis, if the output is assigned to an object named, for example, a, then the complete contents of the object can be viewed directly with the unclass function, here as unclass(a). Invoking the class function on the saved object reveals a class of out_all. The class of each of the text pieces of output is out.

Author(s)

David W. Gerbing (Portland State University; gerbing@pdx.edu)

References

Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapters 8 and 9, NY: Routledge.

Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond, Journal of Statistics and Data Science Education, 29(3), 251-266, https://www.tandfonline.com/doi/abs/10.1080/26939169.2021.1999871.

Examples



# access the PlantGrowth data frame
ANOVA(weight ~ group, data=PlantGrowth)
#brief version
av_brief(weight ~ group, data=PlantGrowth)

# drop the second treatment, just control and 1 treatment
ANOVA(weight ~ group, data=PlantGrowth, filter=(group != "trt2"))

# variables of interest in a data frame that is not the default d
# two-factor between-groups ANOVA with replications and interaction
# warpbreaks is a data set provided with R
ANOVA(breaks ~ wool * tension, data=warpbreaks)

# randomized blocks design with the second term the blocking factor
#   data from Gerbing(2014, Sec 7.3.1)

# Each person is a block. Each person takes four weight-training 
#   supplements on different days and then count the repetitions
#   of the bench presses.
d <- read.csv(header=TRUE, text="
Person,sup1,sup2,sup3,sup4
p1,2,4,4,3
p2,2,5,4,6
p3,8,6,7,9
p4,4,3,5,7
p5,2,1,2,3
p6,5,5,6,8
p7,2,3,2,4")

# reshape data from wide form to long form
# do not need the row names
d <- reshape(d, direction="long",
        idvar="Person", v.names="Reps",
        varying=list(2:5), timevar="Supplement")
rownames(data) <- NULL

ANOVA(Reps ~ Supplement + Person)

[Package lessR version 4.3.6 Index]