stepwiseqtl {qtl} | R Documentation |
Stepwise selection for multiple QTL
Description
Performs forward/backward selection to identify a multiple QTL model, with model choice made via a penalized LOD score, with separate penalties on main effects and interactions.
Usage
stepwiseqtl(cross, chr, pheno.col=1, qtl, formula, max.qtl=10, covar=NULL,
method=c("imp", "hk"), model=c("normal", "binary"),
incl.markers=TRUE, refine.locations=TRUE,
additive.only=FALSE, scan.pairs=FALSE, penalties,
keeplodprofile=TRUE, keeptrace=FALSE, verbose=TRUE,
tol=1e-4, maxit=1000, require.fullrank=FALSE)
Arguments
cross |
An object of class |
chr |
Optional vector indicating the chromosomes to consider in
search for QTL. This should be a vector of character strings
referring to chromosomes by name; numeric values are converted to
strings. Refer to chromosomes with a preceding |
pheno.col |
Column number in the phenotype matrix which should be used as the phenotype. One may also give character strings matching the phenotype names. Finally, one may give a numeric vector of phenotypes, in which case it must have the length equal to the number of individuals in the cross, and there must be either non-integers or values < 1 or > no. phenotypes; this last case may be useful for studying transformations. |
qtl |
Optional QTL object (of class |
formula |
Optional formula to define the QTL model to be used as a starting point. |
max.qtl |
Maximum number of QTL to which forward selection should proceed. |
covar |
Data frame of additive covariates. |
method |
Indicates whether to use multiple imputation or Haley-Knott regression. |
model |
The phenotype model: the usual model or a model for binary traits |
incl.markers |
If FALSE, do calculations only at points on an evenly spaced grid. |
refine.locations |
If TRUE, use |
additive.only |
If TRUE, allow only additive QTL models; if FALSE, consider also pairwise interactions among QTL. |
scan.pairs |
If TRUE, perform a two-dimensional, two-QTL scan at each step of forward selection. |
penalties |
Vector of three (or six) values indicating the penalty on the number of QTL terms. If three values, these are the penalties on main effects and heavy and light penalties on interactions. If six values, these include X-chr-specific penalties, and the values are: main effect for autosomes, main effect for X chr, heavy penalty on A:A interactions, light penalty on A:A interactions, penalty on A:X interactions, and penalty on X:X interactions. See the Details below. If missing, default values are used that are based on simulations of backcrosses and intercrosses with genomes modeled after that of the mouse. |
keeplodprofile |
If TRUE, keep the LOD profiles from the last iteration as attributes to the output. |
keeptrace |
If TRUE, keep information on the sequence of models visited through the course of forward and backward selection as an attribute to the output. |
verbose |
If TRUE, give feedback about progress. If
|
tol |
Tolerance for convergence for the binary trait model. |
maxit |
Maximum number of iterations for fitting the binary trait model. |
require.fullrank |
If TRUE, give LOD=0 when covariate matrix in the linear regression is not of full rank. |
Details
We seek to identify the model with maximal penalized LOD score. The
penalized LOD score, defined in Manichaikul et al. (2009),
is the LOD score for the model (the \log_{10}
likelihood
ratio comparing the model to the null model with no QTL) with
penalties on the number of QTL and QTL:QTL interactions.
We consider QTL models allowing pairwise interactions among QTL but with an enforced hierarchy in which inclusion of a pairwise interaction requires the inclusion of both of the corresponding main effects. Additive covariates may be included, but currently we do not explore QTL:covariate interactions. Also, the penalized LOD score criterion is currently defined only for autosomal loci, and results with the X chromosome should be considered with caution.
The penalized LOD score is of the form pLOD(\gamma) = LOD(\gamma)
- T_m p_m - T_h p_h - T_l p_l
where \gamma
denotes a model,
p_m
is the number of QTL in the model ("main effects"),
p_h
is the number of pairwise interactions that will be
given a heavy interaction penalty, p_l
is the number of pairwise
interactions that will be given a light interaction penalty,
T_m
is the penalty on main effects, T_h
is the heavy
interaction penalty, and T_l
is the light interaction
penalty. The penalties
argument is the vector (T_m, T_h,
T_l)
. If T_l
is missing (penalties
has a vector of length 2), we assume T_l = T_h
, and so
all pairwise interactions are assigned the same penalty.
The "heavy" and "light" interaction penalties can be a bit
confusing. Consider the clusters of QTL that are connected via one or
more pairwise interactions. To each such cluster, we assign at most
one "light" interaction penalty, and give all other pairwise
interactions the heavy interaction penalty. In other words, if
p_i
is the total number of pairwise interactions for a QTL
model, we let p_l
be the number of clusters of connected QTL
with at least one pairwise interaction, and then let p_h - p_i -
p_l
.
Let us give an explicit example. Consider a model with 6 QTL, and
with interactions between QTL 2 and 3, QTL 4 and 5 and QTL 4 and 6
(so we have the model formula
y ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q2:Q3 + Q4:Q5 + Q4:Q6
).
There are three clusters of connected QTL: (1), (2,3) and (4,5,6). We
would assign 6 main effect penalties (T_m
), 2 light
interaction penalties (T_l
), and 1 heavy interaction penalty
(T_h
).
Manichaikul et al. (2009) described a system for deriving the
three penalties on the basis of permutation results from a
two-dimensional, two-QTL genome scan (as calculated with
scantwo
). These may be calculated with the
function calc.penalties
.
A forward/backward search method is used, with the aim to optimize the penalized LOD score criterion. That is, we seek to identify the model with maximal the penalized LOD score. The search algorithm was based closely on an algorithm described by Zeng et al. (1999).
We use forward selection to a model of moderate size (say 10 QTL),
followed by backward elimination all the way to the null model. The
chosen model is that which optimizes the penalized LOD score
criterion, among all models visited. The detailed algorithm is as
follows. Note that if additive.only=TRUE
, no pairwise
interactions are considered.
Start at the null model, and perform a single-QTL genome scan, and choose the position giving the largest LOD score. If
scan.pairs=TRUE
, start with a two-dimensional, two-QTL genome scan instead. If an initial QTL model were defined through the argumentsqtl
andformula
, start with this model and jump immediately to step 2.With a fixed QTL model in hand:
Scan for an additional additive QTL.
For each QTL in the current model, scan for an additional interacting QTL.
If there are
\ge
2 QTL in the current model, consider adding one of the possible pairwise interactions.If
scan.pairs=TRUE
perform a two-dimensional, two-QTL scan, seeking to add a pair of novel QTL, either additive or interacting.Step to the model that gives the largest value for the model comparison criterion, among those considered at the current step.
Refine the locations of the QTL in the current model (if
refine.locations=TRUE
).Repeat steps 2 and 3 up to a model with some pre-determined number of loci.
Perform backward elimination, all the way back to the null model. At each step, consider dropping one of the current main effects or interactions; move to the model that maximizes the model comparison criterion, among those considered at this step. Follow this with a refinement of the locations of the QTL.
Finally, choose the model having the largest model comparison criterion, among all models visited.
In this forward/backward algorithm, it is likely best to build up to an overly large model and then prune it back. Note that there is no "stopping rule"; the chosen model is that which optimizes the model comparison criterion, among all models visited. The search can be time consuming, particularly if a two-dimensional scan is performed at each forward step. Such two-dimensional scans may be useful for identifying QTL linked in repulsion (having effects of opposite sign) or interacting QTL with limited marginal effects, but our limited experience suggests that they are not necessary; important linked or interacting QTL pairs can be picked up in the forward selection to a large model, and will be retained in the backward elimination phase.
Value
The output is a representation of the best model, as measured by the
penalized LOD score (see Details), among all models visited.
This is QTL object (of class "qtl"
, as produced by
makeqtl
), with attributes "formula"
,
indicating the model formula, and "pLOD"
indicating the
penalized LOD score.
If keeplodprofile=TRUE
, LOD profiles from the last pass through
the refinement algorithm are retained as an attribute,
"lodprofile"
, to the object. These may be plotted with
plotLodProfile
.
If keeptrace=TRUE
, the output will contain an attribute
"trace"
containing information on the best model at each step
of forward and backward elimination. This is a list of objects of
class "compactqtl"
, which is similar to a QTL object (as
produced by makeqtl
) but containing just
a vector of chromosome IDs and positions for the QTL. Each will also
have attributes "formula"
(containing the model formula) and
"pLOD"
(containing the penalized LOD score.
Methods
imp
: multiple imputation is used, as described by Sen
and Churchill (2001).
hk
: Haley-Knott regression is used (regression of the
phenotypes on the multipoint QTL genotype probabilities), as described
by Haley and Knott (1992).
Author(s)
Karl W Broman, broman@wisc.edu
References
Manichaikul, A., Moon, J. Y., Sen, Ś, Yandell, B. S. and Broman, K. W. (2009) A model selection approach for the identification of quantitative trait loci in experimental crosses, allowing epistasis. Genetics, 181, 1077–1086.
Broman, K. W. and Speed, T. P. (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J Roy Stat Soc B 64, 641–656, 731–775.
Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315–324.
Sen, Ś. and Churchill, G. A. (2001) A statistical framework for quantitative trait mapping. Genetics 159, 371–387.
Zeng, Z.-B., Kao, C.-H. and Basten, C. J. (1999) Estimating the genetic architecture of quantitative traits. Genetical Research, 74, 279–289.
See Also
calc.penalties
,
plotModel
, makeqtl
,
fitqtl
, refineqtl
,
addqtl
, addpair
Examples
data(fake.bc)
## Not run: fake.bc <- calc.genoprob(fake.bc, step=2.5)
outsw <- stepwiseqtl(fake.bc, max.qtl=3, method="hk", keeptrace=TRUE)
# best model
outsw
plotModel(outsw)
# path through model space
thetrace <- attr(outsw, "trace")
# plot of these
par(mfrow=c(3,3))
for(i in seq(along=thetrace))
plotModel(thetrace[[i]], main=paste("pLOD =",round(attr(thetrace[[i]],"pLOD"), 2)))