getG {BGData} | R Documentation |
Computes a Genomic Relationship Matrix
Description
Computes a positive semi-definite symmetric genomic relation matrix G=XX'
offering options for centering and scaling the columns of X
beforehand.
Usage
getG(X, center = TRUE, scale = TRUE, impute = TRUE, scaleG = TRUE,
minVar = 1e-05, i = seq_len(nrow(X)), j = seq_len(ncol(X)), i2 = NULL,
chunkSize = 5000L, nCores = getOption("mc.cores", 2L), verbose = FALSE)
Arguments
X |
A matrix-like object, typically the genotypes of a |
center |
Either a logical value or a numeric vector of length equal to the
number of columns of |
scale |
Either a logical value or a numeric vector of length equal to the
number of columns of |
impute |
Indicates whether missing values should be imputed. Defaults to
|
scaleG |
Whether XX' should be scaled. Defaults to |
minVar |
Columns with variance lower than this value will not be used in the
computation (only if |
i |
Indicates which rows of |
j |
Indicates which columns of |
i2 |
Indicates which rows should be used to compute a block of the genomic
relationship matrix. Will compute XY' where X is determined by |
chunkSize |
The number of columns of |
nCores |
The number of cores (passed to |
verbose |
Whether progress updates will be posted. Defaults to |
Details
If center = FALSE
, scale = FALSE
and scaleG = FALSE
,
getG
produces the same outcome than tcrossprod
.
Value
A positive semi-definite symmetric numeric matrix.
See Also
file-backed-matrices
for more information on file-backed
matrices. multi-level-parallelism
for more information on
multi-level parallelism. BGData-class
for more information on
the BGData
class.
Examples
# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
options(mc.cores = 1)
}
# Load example data
bg <- BGData:::loadExample()
# Compute a scaled genomic relationship matrix from centered and scaled
# genotypes
g1 <- getG(X = geno(bg))
# Disable scaling of G
g2 <- getG(X = geno(bg), scaleG = FALSE)
# Disable centering of genotypes
g3 <- getG(X = geno(bg), center = FALSE)
# Disable scaling of genotypes
g4 <- getG(X = geno(bg), scale = FALSE)
# Provide own scales
scales <- chunkedApply(X = geno(bg), MARGIN = 2, FUN = sd)
g4 <- getG(X = geno(bg), scale = scales)
# Provide own centers
centers <- chunkedApply(X = geno(bg), MARGIN = 2, FUN = mean)
g5 <- getG(X = geno(bg), center = centers)
# Only use the first 50 individuals (useful to account for population structure)
g6 <- getG(X = geno(bg), i = 1:50)
# Only use the first 100 markers (useful to ignore some markers)
g7 <- getG(X = geno(bg), j = 1:100)
# Compute unscaled G matrix by combining blocks of $XX_{i2}'$ where $X_{i2}$ is
# a horizontal partition of X. This is useful for distributed computing as each
# block can be computed in parallel. Centers and scales need to be precomputed.
block1 <- getG(X = geno(bg), i2 = 1:100, center = centers, scale = scales)
block2 <- getG(X = geno(bg), i2 = 101:199, center = centers, scale = scales)
g8 <- cbind(block1, block2)
# Compute unscaled G matrix by combining blocks of $X_{i}X_{i2}'$ where both
# $X_{i}$ and $X_{i2}$ are horizontal partitions of X. Similarly to the example
# above, this is useful for distributed computing, in particular to compute
# very large G matrices. Centers and scales need to be precomputed. This
# approach is similar to the one taken by the symDMatrix package, but the
# symDMatrix package adds memory-mapped blocks, only stores the upper side of
# the triangular matrix, and provides a type that allows for indexing as if the
# full G matrix is in memory.
block11 <- getG(X = geno(bg), i = 1:100, i2 = 1:100, center = centers, scale = scales)
block12 <- getG(X = geno(bg), i = 1:100, i2 = 101:199, center = centers, scale = scales)
block21 <- getG(X = geno(bg), i = 101:199, i2 = 1:100, center = centers, scale = scales)
block22 <- getG(X = geno(bg), i = 101:199, i2 = 101:199, center = centers, scale = scales)
g9 <- rbind(
cbind(block11, block12),
cbind(block21, block22)
)