chooseSpatialModelOnIC.asrtests {asremlPlus}R Documentation

Uses information criteria to choose the best fitting spatial model for accounting for local spatial variation.

Description

For a response variable measured on a potentially irregular grid of rows and columns of the units, uses information criteria (IC) to decide whether the fit and parsimony of the model fitted to a set of data can be improved by adding, to the fitted model stored in the supplied asrtests.object, one of the following spatial models to account for the local spatial variation: (i) a two-dimensional first-order autocorrelation model, (ii) a two-dimensional tensor-product natural cubic smoothing spline model (TPNCSS), (iii) a two-dimensional tensor-product penalized P-spline model with second-difference penalties (TPPSC2) model, or (iv) a two-dimensional tensor-product penalized linear spline model with first-difference penalties (TPPSL1). The models from which to select can be reduced to a subset of these four models. For each model, a term from the spatial model is only added to the supplied model if the IC of the supplied model is decreased with the addition of that term. If no model improves the IC when a local spatial variation model is added, then the supplied, nonspatial model will be returned. The data can be arranged in sections, for each of which there is a grid and for which the model is to be fitted separately. Also, the rows and columns of a grid are not necessarily one observational unit wide. For TPPSC2 models, the improvement in the fit from rotating the eigenvectors of the penalty matrix can be investigated; if there is no improvement, the unrotated fit will be returned.

One or more rows is added to the test.summary data.frame of the asrtests.object, for each section and each spatial model, stating whether or not the new model has been swapped for a model in which the spatial model has been added to the supplied model. Convergence in fitting the model is checked and a note included in the action if there was not. All components of the asrtests.object are updated to exhibit the differences between the supplied and any new model.

Usage

## S3 method for class 'asrtests'
chooseSpatialModelOnIC(asrtests.obj, trySpatial = "all", 
                       sections = NULL, 
                       row.covar = "cRow", col.covar = "cCol", 
                       row.factor = "Row", col.factor = "Col", 
                       corr.funcs = c("ar1", "ar1"), corr.orders = c(0, 0), 
                       row.corrFitfirst = TRUE, allow.corrsJointFit = TRUE, 
                       dropFixed = NULL, dropRandom = NULL, 
                       nsegs = NULL, nestorder = c(1,1), 
                       usRandLinCoeffs = TRUE, 
                       rotateX = FALSE, ngridangles = NULL, 
                       which.rotacriterion = "AIC", nrotacores = 1, 
                       asreml.option = "grp", tpps4mbf.obj = NULL, 
                       allow.unconverged = FALSE, 
                       allow.fixedcorrelation = FALSE,
                       checkboundaryonly = FALSE, update = FALSE, 
                       maxit = 30, IClikelihood = "full", which.IC = "AIC", 
                       return.asrts = "best", ...)

Arguments

asrtests.obj

An asrtests.object containing the components (i) asreml.obj, (ii) wald.tab, and (iii) test.summary.

trySpatial

A character string nominating the types of spatial model whose fits are to be assessed. Possible values are none, all, corr, TPNCSS, TPPSC2 (or TPPCS), and TPPSL1 (or TPP1LS). If set to none, then just the supplied nonspatial model and the information about its information criteria will be returned. If all, then corr, TPNCSS, TPPSC2 and TPPSL1 will be fitted. Which fitted models are returned is controlled by return.asrts.

sections

A single character string that specifies the name of the column in the data.frame that contains the factor that identifies different sections of the data to which separate spatial models are to be fitted. Note that, for other terms that involve sections in the random formula, there should be separate terms for each level of sections. For example, in a blocked experiment involving multiple sites, there should be the sum of separate terms for the Blocks at each Site i.e. a formula that contains terms like at(Site, i):Block for each site and these are separated by '+'. Otherwise, the combined term (e.g. Site:Block) will impact on the fitting of the local spatial models for the different Sites. Similarly, a separate residual variance for each of the sections should be fitted, unless there is a need to fit a different variance structure to the residual, e.g. heterogeneous residual variances depending on treatments. Separate residual variances for sections can be achieved using the asreml functions dsum or idh. Because, unlike random terms, terms for residual variances are not removed from the model, compound residual terms can be used to include them in the model, e.g. terms with idh or dsum with multiple levels in the list or leaving levels out altogether. In addition to allowing the independent fitting of models to the sections, separate residual variance terms allows a nugget variance to be fitted in a correlation model for each of the sections.

row.covar

A single character string nominating a numeric that contains the values of a centred covariate indexing the rows of a grid. The numeric must be a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

col.covar

A single character string nominating a numeric that contains the values of a centred covariate indexing the columns of a grid. The numeric must be a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

row.factor

A single character string nominating a factor that indexes the rows of a grid that are to be one dimension of a spatial correlation model. The factor must a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

col.factor

A single character string nominating a factor that indexes the columns of a grid that are to be one dimension of a spatial correlation model. The factor must a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

corr.funcs

A single character string of length two that specifies the asreml one-dimensional correlation or variance model function for the row and column dimensions of a two-dimensional separable spatial correlation model to be fitted when spatial.model is corr; the two-dimensional model is fitted as a random term. If a correlation or variance model is not to be investigated for one of the dimensions, specify "" for that dimension. If the correlation model is corb, the values of corr.orders are used for its order argument (b).

corr.orders

A numeric of length two that specifies the order argument (b) values for the row and column dimensions of a two-dimensional separable spatial correlation model when spatial.model is corr and the corr.funcs for a dimension is corb, the asreml banded correlation model. If one of the dimensions does not involve an order argument, set the value of corr.orders for that dimension to zero. For a dimension for which the corr.funcs is corb and corr.orders is zero, a model with a single band, the correlation between immediate neighbours, will be fitted and then further bands, up to a maximum of 10 bands, will be added until the addition of an extra band does not reduce the information criterion nominated using which.IC. Note that the two-dimensional spatial model is fitted as a random term.

row.corrFitfirst

A logical. If TRUE then, in fitting the model for spatial.model set to corr, the row correlation or variance function is fitted first, followed by the addition of the column correlation or variance function. If FALSE, the order of fitting is reversed.

allow.corrsJointFit

A logical which, if TRUE, will allow the simultaneous fitting of correlation functions for the two dimensions of the grid when separate fits have failed to fit any correlation functions. This argument is available for when a joint fit hangs the system.

dropFixed

A single character string or a character vector of strings with an element for each level of sections in the same order as the sections levels. Each string, which if it is not NA and after the addition of ". ~ . -" and conversion to a formula that is then expanded, specifies the sum of a set of terms to be dropped from the fixed formula in fitting splines (TPPS and TPNCSS). The result is that the fitted model supplied in the asrtests.obj, that includes these terms, will be compared with the fitted model that has had them removed and a spatial model added.

An element that is NA indicates that no term pertaining to the corresponding sections level is to be removed. If sections is not NULL and a single character string has been supplied, the terms specified by the string are taken to be terms that are independent of the sections and will be removed when adding the spatial model for the first sections.

The terms must match those in the wald.tab component of the asrtests.obj. The fixed terms will be reordered so that single-variable terms come first, followed by two-variable terms and so on. Note also that multiple terms specified using a single asreml::at function can only be dropped as a whole. If the term was specified using an asreml::at function with a single level, then it can be removed and either the level itself or its numeric position in the levels returned by the levels function can be specified.

dropRandom

A single character string or a character vector of strings with an element for each level of sections in the same order as the sections levels. Each string, which if it is not NA and after the addition of " ~ . -" and conversion to a formula that is then expanded, specifies the sum of a set of terms to be dropped from the random formula in fitting splines (TPPS and TPNCSS). The result is that the fitted model supplied in the asrtests.obj, that includes these terms, will be compared with the fitted model that has had them removed and a spatial model added.

An element that is NA indicates that no term pertaining to the corresponding sections level is to be removed. If sections is not NULL and a single character string has been supplied, the terms specified by the string are taken to be terms that are independent of the sections and will be removed when adding the spatial model for the first sections.

The terms must match those in the vparameters component of the asreml.obj component in the asrtests.obj. Note also that multiple terms specified using a single asreml::at function can only be dropped as a whole. If the term was specified using an asreml::at function with a single level, then it can be removed and either the level itself or its numeric position in the levels returned by the levels function can be specified.

nsegs

A pair of numeric values giving the number of segments into which the column and row ranges are to be split, respectively, for fitting a P-spline model (TPPS) (each value specifies the number of internal knots + 1). If not specified, then (number of unique values - 1) is used in each dimension; for a grid layout with equal spacing, this gives a knot at each data value. If sections is not NULL and the grid differs between the sections, then nsegs will differ between the sections.

nestorder

A numeric of length 2. The order of nesting for column and row dimensions, respectively, in fitting a P-spline model (TPPS). A value of 1 specifies no nesting, a value of 2 generates a spline with half the number of segments in that dimension, etc. The number of segments in each direction must be a multiple of the order of nesting.

usRandLinCoeffs

A logical which, if TRUE, will attempt to fit an unstructured variance model to the constant and linear terms in the interactions for constant and linear terms in one grid dimension interacting with smoooth terms in the second grid dimension. The unstructured variance model can only be fitted if both the constant and linear interaction terms have been retained in the fitted model. This argument can be used to omit the attempt to fit an unstructured variance model when the attempt results in a stystem error.

rotateX

A logical indicating whether to rotate the eigenvectors of the penalty matrix, as described by Piepho, Boer and Williams (2022), when fitting a P-spline (TPPS). Setting rotateX to TRUE results in a a search for an optimized rotation under a model that omits the random spline interaction terms. If ngridangles is set to NULL, the optimal rotation us found using an optimizer (nloptr::bobyqa). Otherwise, the optimal rotation is found by exploring the fit over a two-dimensional grid of rotation angle pairs. The optimization seeks to optimize the criterion nominated in which.rotacriterion. Rotation of the eigenvectors is only relevant for difforder values greater than 1 and has only been implemented for difforder equal to 2.

ngridangles

A numeric of length 2. If NULL (the default), the optimal pair of angles for rotating the eignevectors of the penalty matrix of a P-spline (TPPS) will be determined using a nonlinear optimizer (nloptr::bobyqa). Otherwise, its two values specify the numbers of angles between 0 and 90 degrees for each of the row and column dimensions to be used in determining the optimal pair of angles. Specifying factors of 90 will result in integer-valued angles. The number of grid points, and hence re-analyses will be the product of the values of (ngridangles + 1).

which.rotacriterion

A single character string nominating which of the criteria, out of the deviance, the likelihood, the AIC and the BIC, is to be used in determining the optimal rotation of the eigenvectors of the penalty matrix. The deviance uses the REML value computed by asreml; the other criteria use the full likelihood, evaluated using the REML estimates, that is computed by infoCriteria.asreml.

nrotacores

A numeric specifying the number of cores to deploy for running the analyses required to search the two-diemsional grid of rotation angles when rotateX is TRUE. Parallel processing has been implemented for analyzing, for each column angle, the set of angles to be investigated for the row dimension. The default value of one means that parallel processing will not be used. The value chosen for nrotacores needs to balanced against the other processes that are using parallel processing at the same time.

asreml.option

A single character string specifying whether the grp or mbf methods are to be used to supply externally formed covariate matrices to asreml when fitting a P-spline (TPPS). Compared to the mbf method, the grp method is somewhat faster, but creates large asrtests.objects for which the time it takes to save them can exceed any gains in execution speed. The grp method adds columns to the data.frame containing the data. On the other hand, the mbf method adds only the fixed covariates to data and stores the random covariates in the environment of the internal function that calls the spline-fitting function; there are three smaller data.frames for each section that are not stored in the asreml.object resulting from the fitted model.

tpps4mbf.obj

An object made with makeTPPSplineMats.data.frame that contains the spline basis information for fitting P-splines. The argument tpps4mbf.obj only needs to be set when the mbf option of asreml.option is being used and it is desired to use mbf data.frames that have been created and stored prior to calling chooseSpatialModelOnIC.asrtests. If tpps4mbf.obj is NULL,
makeTPPSplineMats.data.frame will be called internally to produce the required mbf data.frames.

allow.unconverged

A logical indicating whether to accept a new model even when it does not converge. If FALSE and the fit of the new model does not converge, the supplied asrtests.obj is returned. Also, if FALSE and the fit of the new model has converged, but that of the old model has not, the new model will be accepted.

allow.fixedcorrelation

A logical indicating whether to accept a new model even when it contains correlations in the model whose values have been designated as fixed, bound or singular. If FALSE and the new model contains correlations whose values have not been able to be estimated, the supplied asrtests.obj is returned. The fit in the asreml.obj component of the supplied asrtests.obj will also be tested and a warning issued if both fixed correlations are found in it and allow.fixedcorrelation is FALSE.

checkboundaryonly

If TRUE then boundary and singular terms are not removed by rmboundary.asrtests; a warning is issued instead. Note that, for correlation models, the fitting of each dimension and the test for a nugget term are performed with checkboundaryonly set to TRUE and its supplied setting only honoured using a call to rmboundary.asrtests immediately prior to returning the final result of the fitting.

update

If TRUE, and set.terms is NULL, then newfit.asreml is called to fit the model to be tested, using the values of the variance parameters stored in the asreml.object, that is stored in asrtests.obj, as starting values. If FALSE or set.terms is not NULL, then newfit.asreml will not use the stored variance parameter values as starting values when fitting the new model, the only modifications being (i) to fit aptial terms and (ii) those specified via ....

which.IC

A character specifying the information criterion to be used in selecting the best model. Possible values are AIC and BIC. The value of the criterion for supplied model must exceed that for changed model for the changed model to be returned. (For choosing the rotation angle of the eigenvectors of the penalty matrix, see which.rotacriterion.

maxit

A numeric specifying the maximum number of iterations that asreml should perform in fitting a model.

IClikelihood

A character specifying whether Restricted Maximum Likelihood (REML) or the full likelihood, evaluated using REML estimates, (full) are to be used in calculating the information criteria to be included in the test.summary of an asrtests.object or to be used in choosing the best model.

return.asrts

A character string specifying whether the asrtests.object for the best fitting model (smallest AIC or BIC), including the supplied nonspatial model, is returned or the asrtests.objects resulting from the attempted fits of all of the models specified using trySpatial are returned.

...

Further arguments passed to changeModelOnIC.asrtests, asreml and tpsmmb.

Details

For each spatial model that is to be fitted, a fitted spatial model is only returned if it improves the fit over and above that achieved with the model fit supplied in the asrtests.obj, because terms in the spatial model are not added unless model fit is improved by their addition as measured by an IC. If return.asrts is all, then this applies to each spatial model specified by trySpatial. To force a spatial model to be fitted use addSpatialModel.asrtests. The model fit supplied in the asrtests.obj should not include terms that will be included in any local spatial model. All spatial model terms are fitted as fixed or random. Consequently, the residual model does not have to be iid. The improvement in the fit resulting from the addition of a spatial model to the supplied model is evaluated. Note that the data must be in the order that corresponds to the residual argument with a variable to the right of another variable changing levels in the data frame faster than those of the preceding variables e.g. Row:Column implies that all levels for Column are in consecutive rows of the data.frame that have a single Row level.

For the corr spatial model, the default model is an autocorrelation model of order one (ar1) for each dimension. However, any of the single dimension correlation/variance models from asreml can be specified for each dimension, as can no correlation model for a dimension; the models for the two dimensions can differ. Using a forward selection procedure, a series of models are tried, without removing boundary or singular terms, beginning with the addition of row correlation and followed by the addition of column correlation or, if the row.corrFitfirst is set to FALSE, the reverse order. If the fitting of the first-fitted correlation did not result in a model change because the fitting did not converge or correlations were fixed, but the fit of the second correlation was successful, then adding the first correlation will be retried. If one of the metric correlation functions is specified (e.g. exp), then the row.covar or col.covar will be used in the spatial model. However, because the correlations are fitted separately for the two dimensions, the row.factor and col.factor are needed for all models and are used for any dimension that does not involve a correlation/variance function for the fit being performed. Also, the correlation models are fitted as random terms and so the correlation model will include a variance parameter for the grid even when ar1 is used to specify the correlation model, i.e. the model fitted is a variance model and there is no difference between ar1 and ar1v in fitting the model. The variance parameter for this term represents the spatial variance and the fit necessarily includes a nugget term, this being the residual variance. If any correlation is retained in the model, for a section if sections is not NULL, then the need for a nuggest term is assessed by fixing the corresponding residual variance to one, unless there are multiple residual variances and these are not related to the sections. Once the fitting of the correlation model has been completed, the rmboundary function will be executed with the checkboundaryonly value supplied in the chooseSpatialModelOnIC.asrtests call. Finally, checking for bound and singular random terms associated with the correlation model and residual terms will be carried out when there are correlation terms in the model and checkboundaryonly has been set to FALSE; as many as possible will be removed from the fitted model, in some cases by fixing variance terms to one.

The tensor-product natural-cubic-smoothing-spline (TPNCSS) spatial model is as described by Verbyla et al. (2018), the tensor-product penalized-cubic-spline (TPPSC2) model with second-order differencing of the penalty is similar to that described by Rodriguez-Alvarez et al. (2018), and the tensor-product, first-difference-penalty, linear spline (TPPSL1) model is amongst those described by Piepho, Boer and Williams (2022). The fixed terms for the spline models are row.covar + col.covar + row.covar:col.covar and the random terms are spl(row.covar) + spl(col.covar) + dev(row.covar) + dev(col.covar) + spl(row.covar):col.covar + row.covar:spl(col.covar) + spl(row.covar):spl(col.covar), except that spl(row.covar) + spl(col.covar) is replaced with spl(row.covar):int(col.covar) + int(row.covar):spl(col.covar) in the TPPSC2 model, where int(.) indicates an intercept or constant value specific to its argument. For TPPSL1 models, the terms spl(row.covar):col.covar + row.covar:spl(col.covar) are omitted, The supplied model should not include any of these terms. However, any fixed or random main-effect Row or Column term that has been included as an initial model for comparison with a spatial model can be removed prior to fitting the spatial model using dropFixed or dropRandom. For the P-spline models with second-order differencing, the model matrices used to fit the pairs of random terms (i) spl(row.covar):int(col.covar) and spl(row.covar):col.covar and (ii) int(row.covar):spl(col.covar) and row.covar:spl(col.covar) are transformed using the spectral decomposition of their penalty matrices. An unstructured variance model is tried for each of these pairs and retained if it improves the fit. For TPPSC2, it is also possible to optimize the rotation of the null-space eigenvectors of the penalty matrix for each of these random-term pairs (for more information see Piepho, Boer and Williams, 2022). The optimization is achieved either using an optimizer or takes the form of a search over a grid of rotation angles for a reduced model; the fit of the full model with rotation using the optimal rotation angles will only be returned if it improves on the fit of the full, unrotated model.

The TPPSC2 and TPPSL1 models are fitted using functions from the R package TPSbits authored by Sue Welham (2022). There are two methods for supplying the spline basis information produced by tpsmmb to asreml. The grp method adds it to the data.frame supplied in the data argument of the asreml call. The mbf method creates smaller data.frames with the spline basis information in the same environment as the internal function that calls the spline-fitting function. If it is desired to use in a later session, an asreml function, or asrtests function that calls asreml, (e.g. predict.asreml, predictPlus.asreml, or changeTerms.asrtests) on an asreml.object created using mbf terms, then the mbf data.frames will need to be recreated using makeTPPSplineMats.data.frame in the new session, supplying, if there has been rotation of the penalty matrix eigenvectors, the theta values that are returned as the attribute theta.opt of the asreml.obj.

All models utlize the function changeModelOnIC.asrtests to assess the model fit, the information criteria used in assessing the fit being calculated using infoCriteria. Arguments from tpsmmb and changeModelOnIC.asrtests can be supplied in calls to chooseSpatialModelOnIC.asrtests and will be passed on to the relevant function though the ellipses argument (...).

The data for experiment can be divided into sections and an attempt to fit the same spatial model to each is made. The fit may differ for each of the sections, but the fit over all of the sections is assessed. For more detail see sections above.

Each combination of a row.coords and a col.coords does not have to specify a single observation; for example, to fit a local spatial model to the main units of a split-unit design, each combination would correspond to a main unit and all subunits of the main unit would have the same combination.

Value

A list containing four components: (i) asrts, (ii) spatial.IC, (iii) best.spatial.mod, and (iv) best.spatial.IC.

The component asrts itself holds a list of one or more asrtests.objects, either the best overall out of the supplied model and the spatial models, or, for each spatial model, the best out of the supplied model and that spatial model. Each asrtests.object contains the components: (i) asreml.obj, (ii) wald.tab, and (iii) test.summary. If the asrtests.object is the result of fitting a TPPSC2 model with an exploration of the rotation of the eigenvectors of the penalty matrix for the linear components, then the asreml.obj will have an attribute theta.opt that contains the optimal rotation angles of the eigenvectors.

The spatial.IC component holds a data.frame with summary of the values of the information criteria for the supplied model and those resulting from adding the spatial models to the supplied model. In the case of a spatial correlation model, the information criteria for the selected spatial correlation model is returned. If a spatial model could not be fitted, then all returned values will be NA).

The best.spatial.mod component is a character giving the name of the best spatial model, and best.spatial.AIC gives the value of its AIC.

Author(s)

Chris Brien

References

Piepho, H.-P., Boer, M. P., & Williams, E. R. (2022). Two-dimensional P-spline smoothing for spatial analysis of plant breeding trials. Biometrical Journal, 64, 835-857.

Rodriguez-Alvarez, M. X., Boer, M. P., van Eeuwijk, F. A., & Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics, 23, 52-71.

Verbyla, A. P., De Faveri, J., Wilkie, J. D., & Lewis, T. (2018). Tensor Cubic Smoothing Splines in Designed Experiments Requiring Residual Modelling. Journal of Agricultural, Biological and Environmental Statistics, 23(4), 478-508.

Welham, S. J. (2022) TPSbits: Creates Structures to Enable Fitting and Examination of 2D Tensor-Product Splines using ASReml-R. Version 1.0.0 https://mmade.org/tpsbits/

See Also

as.asrtests, makeTPPSplineMats.data.frame, addSpatialModelOnIC.asrtests,
addSpatialModel.asrtests, changeModelOnIC.asrtests, changeTerms.asrtests,
rmboundary.asrtests, testranfix.asrtests, testresidual.asrtests, newfit.asreml,
reparamSigDevn.asrtests, changeTerms.asrtests, infoCriteria.asreml

Examples

## Not run: 

data(Wheat.dat)

#Add row and column covariates
Wheat.dat <- within(Wheat.dat, 
                    {
                      cColumn <- dae::as.numfac(Column)
                      cColumn <- cColumn  - mean(unique(cColumn))
                      cRow <- dae::as.numfac(Row)
                      cRow <- cRow - mean(unique(cRow))
                    })

#Fit initial model
current.asr <- asreml(yield ~ Rep + WithinColPairs + Variety, 
                      random = ~ Row + Column,
                      data=Wheat.dat)

#Create an asrtests object, removing boundary terms
current.asrt <- as.asrtests(current.asr, NULL, NULL, 
                            label = "Random Row and Column effects")
current.asrt <- rmboundary(current.asrt)

# Choose the best of four models for the local spatial variation
current.asrt <- chooseSpatialModelOnIC(current.asrt, 
                                       row.covar = "cRow", col.covar = "cColumn",
                                       dropRowterm = "Row", dropColterm = "Column",
                                       asreml.option = "grp")

## End(Not run)

[Package asremlPlus version 4.4.32 Index]