R: Computes a Genomic Relationship Matrix

getG {BGData}

R Documentation

Computes a Genomic Relationship Matrix

Description

Computes a positive semi-definite symmetric genomic relation matrix G=XX' offering options for centering and scaling the columns of X beforehand.

Usage

getG(X, center = TRUE, scale = TRUE, impute = TRUE, scaleG = TRUE,
  minVar = 1e-05, i = seq_len(nrow(X)), j = seq_len(ncol(X)), i2 = NULL,
  chunkSize = 5000L, nCores = getOption("mc.cores", 2L), verbose = FALSE)

Arguments

`X`	A matrix-like object, typically the genotypes of a `BGData` object.
`center`	Either a logical value or a numeric vector of length equal to the number of columns of `X`. Numeric vector required if `i2` is used. If `FALSE`, no centering is done. Defaults to `TRUE`.
`scale`	Either a logical value or a numeric vector of length equal to the number of columns of `X`. Numeric vector required if `i2` is used. If `FALSE`, no scaling is done. Defaults to `TRUE`.
`impute`	Indicates whether missing values should be imputed. Defaults to `TRUE`.
`scaleG`	Whether XX' should be scaled. Defaults to `TRUE`.
`minVar`	Columns with variance lower than this value will not be used in the computation (only if `scale` is not `FALSE`).
`i`	Indicates which rows of `X` should be used. Can be integer, boolean, or character. By default, all rows are used.
`j`	Indicates which columns of `X` should be used. Can be integer, boolean, or character. By default, all columns are used.
`i2`	Indicates which rows should be used to compute a block of the genomic relationship matrix. Will compute XY' where X is determined by `i` and `j` and Y by `i2` and `j`. Can be integer, boolean, or character. If `NULL`, the whole genomic relationship matrix XX' is computed. Defaults to `NULL`.
`chunkSize`	The number of columns of `X` that are brought into physical memory for processing per core. If `NULL`, all columns of `X` are used. Defaults to 5000.
`nCores`	The number of cores (passed to `mclapply`). Defaults to the number of cores as detected by `detectCores`.
`verbose`	Whether progress updates will be posted. Defaults to `FALSE`.

Details

If center = FALSE, scale = FALSE and scaleG = FALSE, getG produces the same outcome than tcrossprod.

Value

A positive semi-definite symmetric numeric matrix.

Examples

# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
    options(mc.cores = 1)
}

# Load example data
bg <- BGData:::loadExample()

# Compute a scaled genomic relationship matrix from centered and scaled
# genotypes
g1 <- getG(X = geno(bg))

# Disable scaling of G
g2 <- getG(X = geno(bg), scaleG = FALSE)

# Disable centering of genotypes
g3 <- getG(X = geno(bg), center = FALSE)

# Disable scaling of genotypes
g4 <- getG(X = geno(bg), scale = FALSE)

# Provide own scales
scales <- chunkedApply(X = geno(bg), MARGIN = 2, FUN = sd)
g4 <- getG(X = geno(bg), scale = scales)

# Provide own centers
centers <- chunkedApply(X = geno(bg), MARGIN = 2, FUN = mean)
g5 <- getG(X = geno(bg), center = centers)

# Only use the first 50 individuals (useful to account for population structure)
g6 <- getG(X = geno(bg), i = 1:50)

# Only use the first 100 markers (useful to ignore some markers)
g7 <- getG(X = geno(bg), j = 1:100)

# Compute unscaled G matrix by combining blocks of $XX_{i2}'$ where $X_{i2}$ is
# a horizontal partition of X. This is useful for distributed computing as each
# block can be computed in parallel. Centers and scales need to be precomputed.
block1 <- getG(X = geno(bg), i2 = 1:100, center = centers, scale = scales)
block2 <- getG(X = geno(bg), i2 = 101:199, center = centers, scale = scales)
g8 <- cbind(block1, block2)

# Compute unscaled G matrix by combining blocks of $X_{i}X_{i2}'$ where both
# $X_{i}$ and $X_{i2}$ are horizontal partitions of X. Similarly to the example
# above, this is useful for distributed computing, in particular to compute
# very large G matrices. Centers and scales need to be precomputed. This
# approach is similar to the one taken by the symDMatrix package, but the
# symDMatrix package adds memory-mapped blocks, only stores the upper side of
# the triangular matrix, and provides a type that allows for indexing as if the
# full G matrix is in memory.
block11 <- getG(X = geno(bg), i = 1:100, i2 = 1:100, center = centers, scale = scales)
block12 <- getG(X = geno(bg), i = 1:100, i2 = 101:199, center = centers, scale = scales)
block21 <- getG(X = geno(bg), i = 101:199, i2 = 1:100, center = centers, scale = scales)
block22 <- getG(X = geno(bg), i = 101:199, i2 = 101:199, center = centers, scale = scales)
g9 <- rbind(
    cbind(block11, block12),
    cbind(block21, block22)
)

[Package BGData version 2.4.1 Index]