sim_trait {simtrait} | R Documentation |
Simulate a complex trait from genotypes
Description
Simulate a complex trait given a SNP genotype matrix and model parameters, which are minimally: the number of causal loci, the heritability, and either the true ancestral allele frequencies used to generate the genotypes or the mean kinship of all individuals.
An optional minimum marginal allele frequency for the causal loci can be set.
The output traits have by default a zero mean and unit variance (for outbred individuals), but those parameters can be modified.
The code selects random loci to be causal, constructs coefficients for these loci (scaled appropriately) and random Normal independent non-genetic effects and random group effects if specified.
There are two models for constructing causal coefficients: random coefficients (RC; default) and fixed effect sizes (FES; i.e., coefficients roughly inversely proportional to allele frequency; use fes = TRUE
).
Suppose there are m
loci and n
individuals.
Usage
sim_trait(
X,
m_causal,
herit,
p_anc = NULL,
kinship = NULL,
mu = 0,
sigma_sq = 1,
labs = NULL,
labs_sigma_sq = NULL,
maf_cut = NA,
loci_on_cols = FALSE,
m_chunk_max = 1000,
fes = FALSE
)
Arguments
X |
The |
m_causal |
The desired number of causal loci. |
herit |
The desired heritability (proportion of trait variance due to genetics). |
p_anc |
The length- |
kinship |
The mean kinship value of the individuals in the data.
The |
mu |
The desired parametric mean value of the trait (scalar, default 0). |
sigma_sq |
The desired parametric variance factor of the trait (scalar, default 1). Corresponds to the variance of an outbred individual. |
labs |
Optional labels assigning individuals to groups, to simulate group effects.
If vector, length must be number of individuals.
If matrix, individuals must be along rows, and levels along columns (for multiple levels of group effects).
The levels are not required to be nested (as the name may falsely imply).
Values can be numeric or strings, simply assigning the same values to individuals in the same group.
If this is non- |
labs_sigma_sq |
Optional vector of group effect variance proportions, one value for each level given in |
maf_cut |
The optional minimum allele frequency threshold (default |
loci_on_cols |
If |
m_chunk_max |
BEDMatrix-specific, sets the maximum number of loci to process at the time. If memory usage is excessive, set to a lower value than default (expected only for extremely large numbers of individuals). |
fes |
If |
Details
To center and scale the trait and locus coefficients vector correctly to the desired parameters (mean, variance, heritability), the parametric ancestral allele frequencies (p_anc
) must be known.
This is necessary since in the heritability model the genotypes are random variables (with means given by p_anc
and a covariance structure given by p_anc
and the kinship matrix), so these genotype distribution parameters are required.
If p_anc
are known (true for simulated genotypes), then the trait will have the specified mean and covariance matrix in agreement with cov_trait()
.
To simulate traits using real genotypes, where p_anc
is unknown, a compromise that works well in practice is possible if the mean kinship
is known (see package vignette).
We recommend estimating the mean kinship using the popkin
package!
Value
A named list containing:
-
trait
: length-n
vector of the simulated trait -
causal_indexes
: length-m_causal
vector of causal locus indexes -
causal_coeffs
: length-m_causal
vector of coefficients at the causal loci -
group_effects
: length-n
vector of simulated group effects, or 0 (scalar) if not simulated
However, if herit = 0
then causal_indexes
and causal_coeffs
will have zero length regardless of m_causal
.
See Also
Examples
# construct a dummy genotype matrix
X <- matrix(
data = c(
0, 1, 2,
1, 2, 1,
0, 0, 1
),
nrow = 3,
byrow = TRUE
)
# made up ancestral allele frequency vector for example
p_anc <- c(0.5, 0.6, 0.2)
# made up mean kinship
kinship <- 0.2
# desired heritability
herit <- 0.8
# create simulated trait and associated data
# default is *random coefficients* (RC) model
obj <- sim_trait(X = X, m_causal = 2, herit = herit, p_anc = p_anc)
# trait vector
obj$trait
# randomly-picked causal locus indexes
obj$causal_indexes
# regression coefficients vector
obj$causal_coeffs
# *fixed effect sizes* (FES) model
obj <- sim_trait(X = X, m_causal = 2, herit = herit, p_anc = p_anc, fes = TRUE)
# either model, can apply to real data by replacing `p_anc` with `kinship`
obj <- sim_trait(X = X, m_causal = 2, herit = herit, kinship = kinship)