R: Fit a dyadic mixed model to pedigree data

dmm {dmm}

R Documentation

Fit a dyadic mixed model to pedigree data

Description

Dyadic mixed model analysis with multi-trait responses and pedigree-based partitioning of individual variation into a range of environmental and genetic variance components for individual and maternal effects.

Usage

dmm(mdf, fixform = Ymat ~ 1, components = c("VarE(I)", "VarG(Ia)"),
    specific.components=NULL, cohortform = NULL, posdef = T, gls = F,
    glsopt = list(maxiter = 200, bdamp = 0.8, stoptol = 0.01),
    dmeopt = "qr", ncomp.pcr = "rank", relmat = "inline", dmekeep = F,
    dmekeepfit = F, traitspairwise=F, traitsblockwise=F,...)

## Default S3 method:
dmm(mdf, fixform = Ymat ~ 1, 
    components = c("VarE(I)", "VarG(Ia)"),
    specific.components=NULL, cohortform = NULL, posdef = T, gls = F,
    glsopt = list(maxiter = 200, bdamp = 0.8, stoptol = 0.01),
    dmeopt = "qr", ncomp.pcr = "rank", relmat = "inline", dmekeep = F,
    dmekeepfit = F, traitspairwise=F, traitsblockwise=F,...)

Arguments

`mdf`	Either a dataframe or an object of class `mdf` which is a list containing a dataframe and one or more relationship matrices, as made by function `mdf()` If missing the variables are searched for in the standard way.
`fixform`	A formula specifying the fixed-effect part of the model and the response variate(s). Response should be a matrix for multi-trait models. Default is `Ymat ~ 1` that is a response matrix called `Ymat` and a fitted mean effect.
`components`	A simple vector specifying each of the components to be partitioned from the residual variation after fitting fixed effects. Residual is assumed to be the level of variation attributed to individuals. The components given here are assumed to sum to phenotypic variance so that if there are cross-effect covariances (eg "CovG(Ia,Ma)" and "CovG(Ma,Ia)") they need to be both present. The default is `c("VarE(I)", "VarG(Ia)")`, that is a simple partitioning into individual environmental ("VarE(I)") and individual additive genetic ("VarG(Ia)") variation. It is necessary to specify the individual environmental variance, as it is actually fitted as a parameter in the dyadic model, not obtained from residual variation. For a full list of available components see `make.ctable()` function. For a brief introduction to the notation for components see Note section below. For a comprehensive definition of notation see 'dmmOverview.pdf' Section 6.
`specific.components`	A list specifying the name of each specific factor and the variance components which are to be specific to that factor. The list takes the form `list(Factor1=c(component1,component2,...),` `Factor2=c(component3,component4,...),...)`. The default is `NULL`, that is no class specific components are partitioned . Each specific factor must exist as a column in the dataframe. The specific factors do not have to be fitted as fixed effects, but can be. The classes ( ie levels) within each factor must be mutually exclusive, that is, each individual can only belong to one level of each factor. For a full coverage of class specific components see the document dmmClassSpecific.pdf. If one wishes all of the specified components to be class specific, one needs to specify `components=NULL` in order to cancel out the default for the `components=` argument.
`cohortform`	A formula specifying the effects which define cohort grouping of individuals. For example `cohortform = ~ Year`. A cohort is a grouping of individuals experiencing the same external environmental conditions, eg a group of sheep born and reared together, commonly referred to as a drop. Cohort should not contain DId - the dam's Id. If one needs to consider littermates, function `dmm()` provides a means of combining maternal environmental and cohort variance components to achieve this. The default is NULL - ie no cohort defined.
`posdef`	A logical flag: should the matrices of variance components be constrained to be positive definite? If TRUE each matrix of cross-trait (co)variances for each "Varxxx" component defined in `components` will be individually positive definite, and each cross-effect covariance (if "Covxxx" components are defined) will be constrained such that the corresponding correlation is in the bounds -1 to 1. If FALSE all components will be as estimated. The default is TRUE.
`gls`	A logical flag: should `dmm()` go on after fitting fixed effects by OLS and estimating components, to re-fit the fixed effects by GLS and re-estimate the components. If TRUE the option `posdef=T` is enforced, as the GLS iteration will fail if matrixes do not remain positive definite. Default is FALSE - ie do the OLS analysis only.
`glsopt`	A list object containing variables used to control the GLS iteration : maxiter Maximum number of iterations. Default 200. bdamp Factor used to damp the setting of new GLS-b coefficients at each round of iteration. A value of 1.0 means no damping. stoptol Value below which the sum of absolute deviations of new from old coefficients must fall to achieve convergence. The GLS iteration normally converges very rapidly. If it does not, consider changing the model, before fiddling with these parameters.
`dmeopt`	One of four regression techniques used to solve the dyadic model equations (DME's) to estimate components: "qr" The default option is to use the QR algorithm directly on the dyadic model equations. This is most efficient, but does not produce a fit object for looking at further statistics such as with `anova` or `plot` or `resid`. "lm" This option calls the `lm()` function to solve the DME's. This is equivalent to QR, but `lm()` produces a fit object which can optionally be part of the returned `dmm` object. "lmrob" This option calls the `lmrob()` function from package `robustbase` to solve the DME's. Robust regression only works for single-trait models. A fit object can be returned. "pcr" This option calls the `mvr()` function from package `pls` with argument `method="svdpc"`. Principal component regression is intended to be used where there are multicollinearities among the components to be estimated. The number of principal components is set to the rank of the DME matrix, but can be overwritten (see `ncomp.pcr` argument). A fit object can be returned. If gls=TRUE the same `dmeopt` option is also used during the GLS iteration.
`ncomp.pcr`	Number of principal components to use during a principal components regression (see `dmeopt` argument). Default is the rank of the DME matrix
`relmat`	One of two ways of setting up the relationship matrices required to estimate the variance components: "inline" The additive genetic relationship matrix will be calculated by inline code each time `dmm()` is run. OK for small datasets. Do not use if non-additive relationship matrices are required. This is the default. "withdf" The required relationship matrices are assumed to be pre-stored in the object of class `mdf` defined in the first argument. See function `mdf()` for setting up relationship matrices in the `mdf` object. Function `mdf()` makes extensive use of the package `nadiv`.
`dmekeep`	Logical flag: should the dyadic model equations be returned as part of the `dmm()` return object? Default is FALSE. The DME's may be a large object.
`dmekeepfit`	Logical flag: should the fit object from solving the DME's be returned as part of the `dmm()` return object. Default is FALSE. The fit object may be large.
`traitspairwise`	Logical flag: should the traits be analysed two at a time in all permutations? Default is FALSE, in which case traits are all analysed simultaneously. This option is useful if traits have different replication. If this option is TRUE `dmm()` will return an object of class `dmmarray`, being an array of class `dmm` objects with the rows and columns named by the trait names. See the Value section for the structure of an object of class `dmmarray`. For this option the dataframe specified as argument `mdf` must be made with function `mdf()` and must contain the matrix of traits 'Ymat' plus the individual traits as columns (`keep=T` option for `mdf()`).
`traitsblockwise`	Logical flag: should the traits be analysed in defined blocks of traits in all permutations of pairs of blocks? Default is FALSE. This option is useful if blocks of traits have different replication. If this option is TRUE, the ellipsis option defining blocks must be present. For this option the dataframe specified as argument `mdf` must be made with function `mdf()` and must contain the matrix of traits 'Ymat' plus the individual traits as columns (`keep=T` option for `mdf()`).
`...`	Ellipsis argument: if `traitsblockwise` is TRUE this argument should contain a number of block definitions of the form `Block1=c("Trait1","Trait2"),Block2=c("Trait3","Trait4"),...` , specifying the traits to be present in each block. In this case `dmm()` will return an object of class `dmmblockarray`, being an array of class `dmm` objects with the rows and columns named `"Block1" and "Block2"`, together with lists of the trait names present in each block. See the Value section for the structure of an object of class `dmmblockarray`.

Details

The minimum requirement to use dmm() directly on a simple dataframe is that it contain columns named "Id", "SId", "DId", and "Sex" plus any fixed effects and traits. The "Id" column must contain identifiers which are unique, numeric, and sequential ( ie they must be numbered 1 to n with unit increments, no duplicates and no gaps). Any fixed effects must be factors, and traits must be numeric. Every "SId" and "DId" code must appear also in the "Id" column even if this results in NA's in every other column. If these requirements are not met, process the dataframe with mdf() before using dmm() Also if any relationship matrix other than additive is required, pre-processing with mdf() is necessary.

Missing values for either traits or fixed effects are simply omitted by dmm() before any processing. There is an heirarchy of models fitted by dmm(). There is one fixed model and one dyadic model, for all traits, and only individuals for which all traits are present are included in the model fit steps. In contrast, all individuals are included in the pedigree and in setting up relationship matrices. Hence the number of individuals with data, may be less than the number of individuals in the pedigree. If options traitspairwise or traitsblockwise are used both the fixed model and the dyadic model will be the same for all traitpairs (or traitblocks) but the replication may differ, so missing values for some traits or sets of traits can be handled in this way.

The (co)variance which is partitioned into components is always the residual (co)variance from the fixed effects model. This is assumed to represent the observed variation among individuals. There is, at this stage, no provision for models with more than one error level, so split plot and repeated measures designs are not provided for. There is nothing to stop one formulating the appropriate fixed effects model and doing the aov() step, but partitioning of any (co)variance other than residual is not at present provided.

The naming conventions for components may seem a little strange. They are designed to be all ASCII and therefore usable by R as rownames or colnames. The function make.ctable() returns a list of all available components ( as returnobject$all), as well as a spectrum of sublists which are used internally. The available components are fully documented in the pdf file dmmOverview.pdf Section 6. Most of the names are obvious (eg "VarG(Ia)" means variance-genetic-individual-additive). The term individual distinguishes individual or direct genetic or environmental effects from maternal genetic or environmental effects.

It is important for the proper estimation of phenotypic (co)variance that any cross-effect covariance components are fitted in symmetric pairs ( for example "CovE(I,M)" and "CovE(M,I)"). For one trait these will be identical, so the covariance will simply enter twice in the sum, as required. However cross-trait-cross-effect covariances will not be identical and the sum, which is a phenotypic covariance, requires that the symmetric pair be present.

In addition to the value returned, dmm() makes a number of lines of screen output which show each processing step and some minimal model check numbers.

Value

An object of class dmm is returned whenever options traitspairwise and traitsblockwise are both FALSE (ie a normal multi-trait analysis). This object is basically a list of some or all of the following items:

`aov`	An object of class `aov` containing the results of fitting the fixed effects (specified in argument `fixform`) by OLS using a call to the `aov()` function
`mdf`	Not currently used - ignore.
`fixform`	Formula specifying fixed effects fitted.
`b`	Coefficients fitted for fixed effects.
`seb`	Standard errors for fixed effect coefficients.
`vara`	Matrix of (co)variances of individuals after adjusting for fixed effects fitted by OLS.
`totn`	Total number of individuals in the analysis (ie with data)
`degf`	Degrees of freedom remaining after fitting fixed effects.
`dme.wmat`	The dyadic model equations matrix. Present only if dmekeep=TRUE. The name 'wmat' refers to the matrix `W` in equations 12 and 13 of the document dmmOverview.pdf.
`dme.psi`	The dyadic model equations right hand sides (or traits) matrix. Present only if dmekeep=TRUE. The name 'psi' refers to the matrix `\Psi` in equations 12 and 13 of the document dmmOverview.pdf.
`dme.fit`	The fit object from solving dyadic model equations. Its form depends on the `dmeopt` argument. For dmeopt="qr" (the default) it just contains the QR transform of the DME's. For dmeopt="lm" it contains the object returned by function `lm()` For dmeopt="lmrob" it contains the object returned by function `lmrob()` For dmeopt="pcr" it contains the object returned by function `mvr()` Present only if dmekeepfit=TRUE.
`dme.mean`	Means of columns of dyadic model equation matrix
`dme.var`	Variances of columns of dyadic model equation matrix
`dme.correl`	Correlations between columns of dyadic model equation matrix.
`pcr.loadings`	Loadings from principal component regression. Only present when dmeopt="pcr".
`dmeopt`	Value of `dmeopt` argument in call to `dmm()` function. Specifies method used to solve DME's, and hence the type of fit object (if one is present).
`siga`	Variance component estimates obtained by solving DME's.
`sesiga`	Standard errors of variance component estimates.
`vard`	Residual (co)variance matrix from solving DME's
`degfd`	Degrees of freedom for residual covariance matrix.
`component`	Vector of component names from the `component` argument in call to `dmm()` function, with the element "VarP(I)" appended.
`correlation`	Estimated genetic or environmental correlation coefficient corresponding to each component
`correlation.variance`	Estimated sampling variance of each correlation coefficient.
`correlation.se`	Estimated standard error of each correlation coefficient.
`fraction`	Estimated proportion of variance (relative to the total phenotypic (co)variance)) corresponding to each component. For example the proportion corresponding to component "VarG(Ia)" is the usual additive genetic heritability.
`fraction.variance`	Estimated sampling variance of each proportion.
`fraction.se`	Estimated standard error of each proportion.
`variance.components`	Variance component estimates ( as in siga) but with their total which is phenotypic (co)variance appended.
`variance.components.se`	Standard errors of variance component estimates, including phenotypic (co)variance.
`phenotypic.variance`	Phenotypic (co)variances as a trait x trait matrix.
`phenotypic.variance.se`	Standard errors of phenotypic (co)variances as a matrix.
`observed.variance`	Observed variance, adjusted for fixed effects, in current population. Will differ from phenotypic variance because related animals are correlated. All the estimated components, and their sum ( which is phenotypic variance) are estimates of what the components would be in a population of unrelated individuals.
`call`	The call made to `dmm()` function to generate this object
`gls`	Another list containing all the above items, but for fixed effects fitted by GLS instead of OLS. Will only be present if gls=TRUE and if the gls iteration converged successfully.
`specific`	Another list containing those of the above items which are relevant when class-specific (co)varianc components are estimated. Will only be present if the argument `specific.components` is not NULL, that is if at least one class-specific component is fitted.

If option traitspairwise is TRUE, the value returned by dmm() is an object of class dmmarray, which is an array of which each element is an object of class dmm representing an analysis for one pair of traits. The rows and columns of the array are named using trait names.

If option traitsblockwise is TRUE, the value returned by dmm() is an object of class dmmblockarray, which is a list of two items named array and blocks. List item array is an array of which each element is an object of class dmm representing an analysis for one pair of blocks of traits. The rows and columns of the array are named using block names. List item blocks is a list with one element per block, each containig the set of trait names present in the block.

The functions condense.dmmarray and condense.dmmblockarray are available to facilitate recombining an array of dmm objects into a single object of class dmm with the variance component and genetic parameter estimates appropriately assembled into multi-trait matrices. For example, one would use these functions to prepare an input file for function gresponse.

Note

Two methods of estimating fixed effects are offered by dmm() - termed OLS-b and GLS-b. OLS-b is computationally simple and non-iterative and is the default. Use OLS-b for preliminary runs until the set of components to be estimated from the dyadic model equations is settled. Use GLS-b for the final run. OLS-b leads to MINQUE estimates of the variance components and OLS estimates of the fixed effects. GLS-b leads to bias-corrected-ML estimates of the variance components , and GLS estimates of the fixed effects.

The notation for (co)variance components was designed to use ASCII characters only so that it could be usable as dimnames in R. Some examples should make it clear

"VarE(Ia)": variance environmental individual additive
"VarG(Ia)": variance genetic individual additive
"VarG(Ma)": variance genetic maternal additive
"CovG(Ia,Ma)": covariance genetic individual additive x maternal additive
"VarGs(Ia)": variance genetic sexlinked individual additive

For a full coverage of notation see the document dmmOverview.pdf Section 6.

Author(s)

Neville Jackson

References

The document dmmOverview.pdf has a bibliography of literature references.

Examples


library(dmm)
# Prepare the dataset sheep.df
data(sheep.df)
sheep.mdf <- mdf(sheep.df,pedcols=c(1:3),factorcols=c(4:6),ycols=c(7:9),
             sexcode=c("M","F"),relmat=c("E","A","D"))
# The above code renumbers the pedigree Id's, makes columns "Year","Tb","Sex"
#    into factors,
#    assembles columns "CWW",Diam","Bwt" into a matrix (called 'Ymat')
#    for multivariate processing,
#    and sets up the environmental, additive genetic, and dominance genetic
#    relationship matrices.

# a simple model with individual environmenmtal and 
#    additive genetic components (default)
sheep.fit <- dmm(sheep.mdf, Ymat ~ 1 + Year + Sex,
   components=c("VarE(I)","VarG(Ia)"),gls=TRUE)
# view the components and fixed effect coefficients ( 2 traits only)
summary(sheep.fit,traitset=c("Cww","Diam"),gls=TRUE)
# view the genetic parameters
gsummary(sheep.fit,traitset=c("Cww","Diam"))

rm(sheep.df)
rm(sheep.mdf)
rm(sheep.fit)

# Note: sheep.df is a small demo dataset. The results are illustrative, 
# but not meaningful.
# for a tutorial and fully documented examples see {\em dmmOverview.pdf}

[Package dmm version 2.1-10 Index]