summarize {BGData}R Documentation

Generates Various Summary Statistics

Description

Computes the frequency of missing values, the (minor) allele frequency, and standard deviation of each column of X.

Usage

summarize(X, i = seq_len(nrow(X)), j = seq_len(ncol(X)),
  chunkSize = 5000L, nCores = getOption("mc.cores", 2L),
  verbose = FALSE)

Arguments

X

A matrix-like object, typically the genotypes of a BGData object.

i

Indicates which rows of X should be used. Can be integer, boolean, or character. By default, all rows are used.

j

Indicates which columns of X should be used. Can be integer, boolean, or character. By default, all columns are used.

chunkSize

The number of columns of X that are brought into physical memory for processing per core. If NULL, all elements in j are used. Defaults to 5000.

nCores

The number of cores (passed to mclapply). Defaults to the number of cores as detected by detectCores.

verbose

Whether progress updates will be posted. Defaults to FALSE.

Value

A data.frame with three columns: freq_na for frequencies of missing values, allele_freq for allele frequencies of the counted allele, and sd for standard deviations.

See Also

file-backed-matrices for more information on file-backed matrices. multi-level-parallelism for more information on multi-level parallelism. BGData-class for more information on the BGData class.

Examples

# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
    options(mc.cores = 1)
}

# Load example data
bg <- BGData:::loadExample()

# Summarize the whole dataset
sum1 <- summarize(X = geno(bg))

# Summarize the first 50 individuals
sum2 <- summarize(X = geno(bg), i = 1:50)

# Summarize the first 1000 markers (useful for distributed computing)
sum3 <- summarize(X = geno(bg), j = 1:100)

# Summarize the first 50 individuals on the first 1000 markers
sum4 <- summarize(X = geno(bg), i = 1:50, j = 1:100)

# Summarize by names
sum5 <- summarize(X = geno(bg), j = c("snp81233_C", "snp81234_C", "snp81235_T"))

# Convert to minor allele frequencies (useful if the counted alleles are not
# the minor alleles)
maf <- ifelse(sum1$allele_freq > 0.5, 1 - sum1$allele_freq, sum1$allele_freq)

[Package BGData version 2.4.1 Index]