R: Example 4: One qualitative treatment factor with repeated...

example4 {agriTutorial}

R Documentation

Example 4: One qualitative treatment factor with repeated measurements over time.

Description

Milliken & Johnson (1992, p. 429) discuss data which they describe as repeated leaf index measurements on sorghum. Their data set comprises five replicate blocks of four sorghum varieties and they assume equally spaced repeated measurements on each plot in each block on five consecutive occasions starting two weeks after emergence. No further information is given but it appears that the data is simulated or made-up rather than real. Although real data is more authentic, it can sometimes be useful to discuss the analysis of an example data set from the literature, even when the data is simulated. Milliken & Johnson discuss multivariate analysis of variance of the data but this method take no account of the ordered relationship between repeated observations or the likely correlation structure of the data and we will discuss alternative correlation models that are specifically intended to account for the underlying structure of repeated measures data. The interested reader can, if desired, refer to Milliken & Johnson (1992) Chapter 31 for comparison of the different approaches.

Details

Section 1 calculates polynomials for weeks and blocks using the poly() function. Two sets of polynomials for weeks, raw and orthogonal, are calculated and saved as sorghum$rawWeeks and sorghum$polWeeks respectively. Orthogonal polynomials for blocks are calculated and saved as sorghum$polBlocks. It is important to note that the poly() function calculates all polynomial contrasts up to the required degree but does NOT include the zero-degree polynomial. Additionally, the block variable varblock is saved as a factor factblock.

Section 2 compares five different correlation structures for the repeated measures analysis using the gls() function of the nlme package. Each analysis fits a full factorial model for the variety-by-weeks and blocks-by-weeks effects assuming block and treatment additivity. The goodness of fit of the five models is compared by AIC statistics where the smaller the AIC the better the fit. Here, the AR(1)+nugget model fitted by the corExp() function gave the best fitting model. See help(corExp)for further information about the corExp() function. Note that corSymm represents a general correlation structure and will, presumably, give an analysis similar to a multivariate analysis of variance. Although this structure appears to give the best fit according to the negative log likelihood statistic, this criterion takes no account of the number of estimated variance parameters p in the variance model which, in the case of the corSymm model, is p = 15, compared to only p = 2 for the AR(1) model. When assessed by the AIC statistic, the corSymm model gave the least good fit of any of the non-null correlation structures which is strong evidence that the multivariate analysis of variance method discussed by Milliken & Johnson (1992) will lack power.

Section 3 fits a full regression model over the five weeks of repeated measures and tests for possible variety and variety-by-weeks interactions effects. The weeks factor is decomposed into individual polynomial contrasts (see Table A2 and Table 14) to test the significance of each individual variety-by-weeks polynomial effect. The analysis of polynomial contrasts shows that the variety-by-weeks interaction is due mainly to the degree-1 = variety:rawWeeks[,1] and the degree-2 = variety:rawWeeks[,2] effects, although there is also some evidence of higher-degree variety-by-weeks interaction effects. The analysis also shows the corExp() range and nugget statistics for the full fitted model and these are used to calculate the correlation coefficient usingthe formula rho=(1-nugget)*exp(-1/range). Note that this formula is different from the the formula used in Tables A1 and A2 and will give a different value of rho: see help(corExp).

Section 4 fits a quadratic regression model for weeks assuming the degree-3 and degree-4 polynomial week effects are zero. The average effects of blocks are fitted by polBlocks and the interactions between the blocks and the weeks are fitted by polBlocks:(rawWeeks[,1] + rawWeeks[,2] + polWeeks[,3]+ polWeeks[,4]). The gls() algorithm requires the same polynomial weeks contrasts in both the blocks and the varieties models which is why raw degree-1 and degree-2 weeks contrasts have been used for the blocks-by-weeks interaction model. However, orthogonal polynomials have better numerical stability than raw polynomials so orthogonal polynomial contrasts have been used for the degree-3 and degree-4 weeks contrasts. The summary analysis shows all variety effects as differences from the intercept which, in this analysis, is variety 1 therefore all model effects in Table 15 can be derived by adding appropriate effects to the intercept. If SED's are required, these must be calculated from the variance/covariance matrix which can be extracted by the code vcov(). Using this matrix, the SED for variety differences was calculated to be 0.172, the SED for the variety-by-linear weeks slope parameters was calculated to be 0.117 and the SED for the variety-by-quadratic weeks slope parameters was calculated to be 0.0192. These estimates are approximately 2-3 percent larger than those shown in Table 15 but it is not clear if the discrepancies are due to the model specification or to a difference between the R and the SAS software. Possibly the implementation of the Kenward-Roger method of adjusting the denominator d.f. and the estimated variance-covariance matrix of the estimated fixed effects might be different for the two algorithms. The range, nugget and correlation coefficient are extracted and displayed and a graphical plot of the studentized residuals from the quadratic regression model is also shown.

Section 5 fits a quadratic regression model for variety-by-week interaction effects assuming a full degree-4 polynomial model for weeks and blocks-by-weeks effects. The quadratic regression model in Section 4 corresponds to the regression model used for Tables 14 and 15 of Piepho and Edmondson (2018) but the range = 3397131013 and nugget = 0.4605535 of this model are very different from the range = 10.35774 and nugget = 0.1720444 of the full factorial model. As there is evidence from Table A2 that the degree-3 and degree-4 polynomial weeks effects are non-negligible, the quadratic model for weeks effects in Section 4 may be inadequate for the data and the model may be underfitted. In this section, the assumption that the degree-3 and degree-4 polynomial weeks effects are zero is relaxed and a full degree-4 model for weeks and block-by-weeks interaction effects is fitted. The fitted model for treatment effects needs to be as parsimonious as possible to ensure that estimates of treatment effects are robust against model assumptions and a degree-2 regression model for variety-by-weeks effects appears to be the most appropriate treatment model for this data. With this model, the values of the auto-correlation parameters are: range = 42.75763, nugget = 0.3586337 and correlation = 0.6265403 which are much closer to the autocorrelation parameters from the full factorial model than are those from Section 4. As the model fits the full polynomial weeks model, it is not necessary to use polynomial blocks contrasts which gives a substantial simplification in coding.

Comment The model fitted in Section 5 appears to be the best model available based on the generalized least squares method but it is clear from the graphical plots of studentized residuals that the fitted data contains outliers that are not well accommodated by the fitted model. If the data was from a real experiment, further information about the data might be available but as the data seems to be artificial this option is not available. In this situation, various robust methods of model fitting or regression analysis that can accommodate non-standard distributions or model outliers are available. However, these methods are beyond the scope of this tutorial and will not be discussed further here.

agriTutorial: return to home page if you want to select a different example

References

Milliken, G.A., & Johnson, D.E. (1992). Analysis of messy data. Volume I: Designed experiments. Boca Raton: CRC Press.

Piepho, H. P, and Edmondson. R. N. (2018). A tutorial on the statistical analysis of factorial experiments with qualitative and quantitative treatment factor levels. Journal of Agronomy and Crop Science. DOI: 10.1111/jac.12267. View

Examples


## *************************************************************************************
##                       How to run the code
## *************************************************************************************

## Either type example("example4") to run ALL the examples succesively
## or copy and paste examples sucessively, as required

## *************************************************************************************
##                          Options and required packages
## *************************************************************************************

options(contrasts = c('contr.treatment','contr.poly'))
require(nlme)

## *************************************************************************************
##            Section 1:  Polynomials for weeks and blocks contrasts
## *************************************************************************************

sorghum$rawWeeks = poly(sorghum$varweek, degree = 4, raw = TRUE)
sorghum$polWeeks = poly(sorghum$varweek, degree = 4, raw = FALSE)
sorghum$polBlocks = poly(sorghum$varblock, degree = 4, raw = FALSE)
sorghum$factblock = factor(sorghum$varblock)

## *************************************************************************************
## Section 2: Various correlation models assuming full factorial blocks and weeks model
## *************************************************************************************

AIC = NULL
logLik = NULL
Model = c("ID", "CS", "AR(1)", "AR(1) + nugget", "UN")

## independent uncorrelated random plots
full_indy = gls(y ~ factweek * (Replicate + variety), sorghum)
anova(full_indy)
AIC = c(AIC, AIC(full_indy))
logLik = c(logLik, logLik(full_indy))

## corCompSymm compound symmetry
corCompSymm = gls(y ~ factweek * (Replicate + variety),
corr = corCompSymm(form = ~ varweek|factplot), sorghum)
anova(corCompSymm)
AIC = c(AIC, AIC(corCompSymm))
logLik = c(logLik, logLik(corCompSymm))
Variogram(corCompSymm)

## corExp without nugget
corExp = gls(y ~ factweek * (Replicate + variety),
 corr = corExp(form = ~ varweek|factplot), sorghum)
anova(corExp)
AIC = c(AIC, AIC(corExp))
logLik = c(logLik, logLik(corExp))
Variogram(corExp)

##  corExp with nugget
corExp_nugget = gls(y ~ factweek * (Replicate + variety),
 corr = corExp(form = ~ varweek|factplot, nugget = TRUE), sorghum)
anova(corExp_nugget)
AIC = c(AIC, AIC(corExp_nugget))
logLik = c(logLik, logLik(corExp_nugget))
Variogram(corExp)

##  corSymm unstructured
corSymm = gls(y ~ factweek * (Replicate + variety), corr = corSymm(form = ~ 1|factplot),
 weights = varIdent(form = ~ 1|varweek), sorghum)
anova(corSymm)
AIC = c(AIC, AIC(corSymm))
logLik = c(logLik, logLik(corSymm))
Variogram(corSymm)

##  Table 11 Comparison of log Likelihood and AIC statistics for different correlation structures
dAIC = AIC - AIC[4]
logLik = -2 * logLik
dlogLik = logLik - logLik[4]
AICtable = data.frame(Model, round(logLik, 2), round(dlogLik, 2), round(AIC, 2), round(dAIC, 2))
colnames(AICtable) = c("Covar_Model", "-2logLr", "-diff2logLr", "AIC", "diffAIC")
AICtable

## *************************************************************************************
##  Section 3: Factorial block and variety effects assuming full polynomial week effects
## *************************************************************************************

## Table A2 (cf Table 14) Sequential Wald tests for full model sorghum data
pol_Wald =
gls(y ~ (factblock+variety) * (rawWeeks[,1] + rawWeeks[,2] + polWeeks[,3] + polWeeks[,4]),
corr = corExp(form = ~ varweek | factplot, nugget = TRUE), sorghum)
anova(pol_Wald)
range=coef(pol_Wald$modelStruct$corStruct,unconstrained=FALSE)[1]
nugget=coef(pol_Wald$modelStruct$corStruct,unconstrained=FALSE)[2]
rho=(1-nugget)*exp(-1/range)
cat("Range =", range, "\n")
cat("Nugget =", nugget, "\n")
cat("Correlation =", rho, "\n")
ACF(pol_Wald)
plot(pol_Wald,sub.caption = NA, main = "Residuals from full polynomial weeks model")

## *************************************************************************************
## Section 4: Degree-2 model for weeks-by-variety and weeks-by-blocks effects assuming
## degree-3 and degree-4 week effects are zero
## *************************************************************************************

 ## Table 15 coefficients assuming a quadratic weeks model for both block and treatment effects
quad_Wald = gls(y ~  polBlocks + variety + rawWeeks[,1] + rawWeeks[,2] +
 polBlocks:(rawWeeks[,1] + rawWeeks[,2]+ polWeeks[,3] + polWeeks[,4]) +
 variety:(rawWeeks[,1] + rawWeeks[,2]),
 corr = corExp(form = ~ varweek | factplot, nugget=TRUE), sorghum)
anova(quad_Wald)
summary(quad_Wald)$tTable
vcov(quad_Wald)
range=coef(quad_Wald$modelStruct$corStruct,unconstrained=FALSE)[1]
nugget=coef(quad_Wald$modelStruct$corStruct,unconstrained=FALSE)[2]
rho=(1-nugget)*exp(-1/range)
cat("Range =", range, "\n")
cat("Nugget =", nugget, "\n")
cat("Correlation =", rho, "\n")
plot(quad_Wald,sub.caption = NA, main = "Residuals from quadratic regression model")

## *************************************************************************************
## Section 5: Quadratic model for weeks-by-variety effects assuming full degree-4 model
## for weeks and weeks-by-blocks effects
## *************************************************************************************

##  Model assuming a quadratic variety-by-weeks model and quartic blocks-by-weeks model
quad_Wald = gls(y ~ Replicate * (rawWeeks[,1] + rawWeeks[,2] + polWeeks[,3] + polWeeks[,4]) +
 variety * (rawWeeks[,1] + rawWeeks[,2]),
 corr = corExp(form = ~ varweek | factplot, nugget = TRUE), sorghum)
anova(quad_Wald)
summary(quad_Wald)$tTable
range=coef(quad_Wald$modelStruct$corStruct,unconstrained=FALSE)[1]
nugget=coef(quad_Wald$modelStruct$corStruct,unconstrained=FALSE)[2]
rho=(1-nugget)*exp(-1/range)
cat("Range =", range, "\n")
cat("Nugget =", nugget, "\n")
cat("Correlation =", rho, "\n")
plot(quad_Wald,sub.caption = NA, main = "Quadratic treatment-by-weeks model with full
blocks-by-weeks model")

[Package agriTutorial version 0.1.5 Index]