R: Linear and Multilinear Genetic Regressions

Genetic regression {noia}

R Documentation

Linear and Multilinear Genetic Regressions

Description

The regression aims at estimating genetic effects from a population in which the genotypes and phenotypes are known.

Usage

linearRegression(phen, gen=NULL, genZ=NULL, 
    reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE)
multilinearRegression(phen, gen=NULL, genZ=NULL, 
    reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE, 
    e.unique=FALSE, start.algo = "linear", start.values=NULL, 
    robust=FALSE, bilinear.steps=1, ...)

Arguments

`phen`	The vector of individual phenotypes measured in the population.
`gen`	The matrix of individual genotypes in the population, one column per locus. See `genNames` for the genotype encoding. Not necessary if `genZ` is provided.
`genZ`	The matrix of individual genotypic probabilities in the population, 3 columns per locus, corresponding of the probability of each of the 3 genotypes (the sum must be 1). Not necessary if `gen` is provided.
`reference`	The reference point from which the regression is performed. By default, the `"noia"` reference point is used, since it provides a fairly good orthogonality. Other possibilities are `"G2A"`, `"F2"`, `"F1"`, `"Finf"`, `"UWR"`, `"P1"` and `"P2"`.
`max.level`	Maximum level of interactions.
`max.dom`	Maximum level for dominance effects. Does not have any effect if >= `max.level`. In the multilinear regression, the maximum level for dominance effects cannot be > 1.
`fast`	This "fast" algorithm should be used when (i) the number of loci is high (> 8) and (ii) there are uncertainties in the dataset (missing values or Haley-Knott regression). This algorithm computes the regression matrix directly function, i.e. without computing `Z` nor `S` matrices.
`e.unique`	Whether the multilinear term is the same for all pairs.
`start.algo`	Algorithm used to compute the starting values. Can be `"linear"`, `"multilinear"`, `"subset"` or `"bilinear"`. Ignored if `start.values` are provided.
`start.values`	Vector of starting values.
`robust`	Tries sequentially all starting values algorithms.
`bilinear.steps`	Number of steps. Ignored if `start.algo` is not `"bilinear"`. If `NULL`, the bilinear algorithm is run until (almost) convergence.
`...`	Extra parameters to the non-linear regression function `nls`, including `nls.control`.

Details

If a gen data set is provided, it will be turned into a genZ. Missing data (unknown genotypes) are considered as loci for which genotypic probabilities are identical to the genotypic frequencies in the population.

The algebraic framework is described extensively in Alvarez-Castro & Carlborg 2007. The default reference point ("noia") provides an orthogonal decomposition of genetic effects in the 1-locus case, whatever the genotypic frequencies. It remains a good approximation of orthogonality in the multi-locus case if linkage disequilibrium is small. Other optional reference points are those of the "G2A" model (Zeng et al. 2005), and the unweighted regression model "UWR" (Cheverud & Routman, 1995). Several key populations can be taken as reference as well: "F2", "F1", "Finf" (F infinity), and the two "parental" homozygous populations "P1" and "P2".

The multilinear model for genetic interactions is an alternative way to model epistatic interactions between at least two loci (see Hansen & Wagner 2001). The computation of multilinear estimates requires a non-linear regression step that relies on the nls function. Providing good starting values for the non-linear regression is a key to ensure convergence, and different algorithms are provided, that can be specified by the "start.algo" option. "linear" performs a linear regression and approximates the genetic effects from it, while "multilinear" performs a simpler multilinear regression (without dominance) to initialize the genetic effects. "subset" estimate all genetic effects from a random subset (50%) of the population, and "bilinear" estimate alternatively marginal and epistatic effects.

Value

linearRegression and multilinearRegression return an object of class "noia.linear" or "noia.multilinear", both having their own print methods: print.noia.linear and print.noia.multilinear.

Author(s)

Arnaud Le Rouzic

References

Alvarez-Castro JM, Carlborg O. (2007). A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics 176(2):1151-1167.

Alvarez-Castro JM, Le Rouzic A, Carlborg O. (2008). How to perform meaningful estimates of genetic effects. PLoS Genetics 4(5):e1000062.

Cheverud JM, Routman, EJ. (1995). Epistasis and its contribution to genetic variance components. Genetics 139:1455-1461.

Hansen TF, Wagner G. (2001) Modeling genetic architecture: A multilinear theory of gene interactions. Theoretical Population Biology 59:61-86.

Le Rouzic A, Alvarez-Castro JM. (2008). Estimation of genetic effects and genotype-phenotype maps. Evolutionary Bioinformatics 4.

Zeng ZB, Wang T, Zou W. (2005). Modelling quantitative trait loci and interpretation of models. Genetics 169: 1711-1725.

Examples

set.seed(123456789)

map <- c(0.25, -0.75, -0.75, -0.75, 2.25, 2.25, -0.75, 2.25, 2.25)
pop <- simulatePop(map, N=500, sigmaE=0.2, type="F2")

# Regressions

linear <- linearRegression(phen=pop$phen, gen=cbind(pop$Loc1, pop$Loc2))

multilinear <- multilinearRegression(phen=pop$phen, 
    gen=cbind(pop$Loc1, pop$Loc2))

# Linear effects, associated variances and stderr
linear

# Multilinear effects
multilinear

[Package noia version 0.97.3 Index]