scan_pvl {qtl2pleio} | R Documentation |
Perform model fitting for all ordered pairs of markers in a genomic region of interest
Description
'scan_pvl' calculates log likelihood for d-variate phenotype model fits. Inputted parameter 'start_snp' indicates where in the 'probs' object to start the scan.
Usage
scan_pvl(
probs,
pheno,
kinship = NULL,
addcovar = NULL,
start_snp = 1,
n_snp,
max_iter = 10000,
max_prec = 1/1e+08,
cores = 1
)
Arguments
probs |
an array of founder allele probabilities for a single chromosome |
pheno |
a matrix of phenotypes |
kinship |
a kinship matrix for one chromosome |
addcovar |
a matrix, n subjects by c additive covariates |
start_snp |
index of where to start the scan within probs |
n_snp |
the number of (consecutive) markers to include in the scan |
max_iter |
maximum number of iterations for EM algorithm |
max_prec |
stepwise precision for EM algorithm. EM stops once incremental difference in log likelihood is less than max_prec |
cores |
number of cores to use when parallelizing via parallel::mclapply. Set to 1 for no parallelization. |
Details
The function first discards individuals with one or more missing phenotypes or missing covariates.
It then infers variance components, Vg and Ve. Both Vg and Ve
are d by d covariance matrices. It uses an expectation maximization algorithm, as
implemented in the 'gemma2' R package. 'gemma2' R package is an R implementation of the
GEMMA algorithm for multivariate variance component estimation (Zhou & Stephens 2014 Nature methods).
Note that variance components are fitted on a model that uses the d-variate phenotype
but contains no genetic information. This model does, however,
use the specified covariates (after dropping dependent columns
in the covariates matrix).
These inferred covariance matrices, \hat{Vg}
and \hat{Ve}
,
are then used in subsequent model fitting via
generalized least squares.
Generalized least squares model fitting is applied to every d-tuple of
markers within the specified genomic region for 'scan_pvl'.
For a single d-tuple of markers, we fit the model:
vec(Y) = Xvec(B) + vec(G) + vec(E)
where
G \sim MN(0, K, \hat{Vg})
and
E \sim MN(0, I, \hat{Ve})
where MN
denotes the matrix-variate
normal distribution with three parameters: mean matrix, covariance among rows, and
covariance among columns. vec
denotes the vectorization operation, ie, stacking by columns.
K
is a kinship matrix, typically calculated by leave-one-chromosome-out methods.
Y
is the n by d phenotypes matrix. X
is a block-diagonal nd by fd matrix consisting of
d blocks each of dimension n by f. Each n by f block (on the diagonal) contains a matrix of
founder allele probabilities for the n subjects at a single marker. The off-diagonal blocks
have only zero entries.
The log-likelihood is returned for each model. The outputted object is a tibble with
d + 1 columns. The first d columns specify the markers used in the corresponding model fit, while
the last column specifies the log-likelihood value at that d-tuple of markers.
Value
a tibble with d + 1 columns. First d columns indicate the genetic data (by listing the marker ids) used in the design matrix; last is log10 likelihood
References
Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.
Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111-1127.
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods 11:407-409.
Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Sen S, Yandell BS, Churchill GA (2019) R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multi-parent populations. GENETICS https://www.genetics.org/content/211/2/495.
Examples
# read data
n <- 50
pheno <- matrix(rnorm(2 * n), ncol = 2)
rownames(pheno) <- paste0("s", 1:n)
colnames(pheno) <- paste0("tr", 1:2)
probs <- array(dim = c(n, 2, 5))
probs[ , 1, ] <- rbinom(n * 5, size = 1, prob = 0.2)
probs[ , 2, ] <- 1 - probs[ , 1, ]
rownames(probs) <- paste0("s", 1:n)
colnames(probs) <- LETTERS[1:2]
dimnames(probs)[[3]] <- paste0("m", 1:5)
scan_pvl(probs = probs, pheno = pheno, kinship = NULL,
start_snp = 1, n_snp = 5, cores = 1)