summarize {BGData} | R Documentation |
Generates Various Summary Statistics
Description
Computes the frequency of missing values, the (minor) allele frequency, and
standard deviation of each column of X
.
Usage
summarize(X, i = seq_len(nrow(X)), j = seq_len(ncol(X)),
chunkSize = 5000L, nCores = getOption("mc.cores", 2L),
verbose = FALSE)
Arguments
X |
A matrix-like object, typically the genotypes of a |
i |
Indicates which rows of |
j |
Indicates which columns of |
chunkSize |
The number of columns of |
nCores |
The number of cores (passed to |
verbose |
Whether progress updates will be posted. Defaults to |
Value
A data.frame
with three columns: freq_na
for frequencies of
missing values, allele_freq
for allele frequencies of the counted
allele, and sd
for standard deviations.
See Also
file-backed-matrices
for more information on file-backed
matrices. multi-level-parallelism
for more information on
multi-level parallelism. BGData-class
for more information on
the BGData
class.
Examples
# Restrict number of cores to 1 on Windows
if (.Platform$OS.type == "windows") {
options(mc.cores = 1)
}
# Load example data
bg <- BGData:::loadExample()
# Summarize the whole dataset
sum1 <- summarize(X = geno(bg))
# Summarize the first 50 individuals
sum2 <- summarize(X = geno(bg), i = 1:50)
# Summarize the first 1000 markers (useful for distributed computing)
sum3 <- summarize(X = geno(bg), j = 1:100)
# Summarize the first 50 individuals on the first 1000 markers
sum4 <- summarize(X = geno(bg), i = 1:50, j = 1:100)
# Summarize by names
sum5 <- summarize(X = geno(bg), j = c("snp81233_C", "snp81234_C", "snp81235_T"))
# Convert to minor allele frequencies (useful if the counted alleles are not
# the minor alleles)
maf <- ifelse(sum1$allele_freq > 0.5, 1 - sum1$allele_freq, sum1$allele_freq)