read_bed {genio} | R Documentation |
Read a genotype matrix in Plink BED format
Description
This function reads genotypes encoded in a Plink-formatted BED (binary) file, returning them in a standard R matrix containing genotypes encoded numerically as dosages (values in c( 0, 1, 2, NA )
).
Each genotype per locus (m
loci) and individual (n
total) counts the number of reference alleles, or NA
for missing data.
No *.fam or *.bim files are read by this basic function.
Since BED does not encode the data dimensions internally, these values must be provided by the user.
Usage
read_bed(
file,
names_loci = NULL,
names_ind = NULL,
m_loci = NA,
n_ind = NA,
ext = "bed",
verbose = TRUE
)
Arguments
file |
Input file path.
*.bed extension may be omitted (will be added automatically if |
names_loci |
Vector of loci names, to become the row names of the genotype matrix.
If provided, its length sets |
names_ind |
Vector of individual names, to become the column names of the genotype matrix.
If provided, its length sets |
m_loci |
Number of loci in the input genotype table.
Required if |
n_ind |
Number of individuals in the input genotype table.
Required if |
ext |
The desired file extension (default "bed").
Ignored if |
verbose |
If |
Details
The code enforces several checks to validate data given the requested dimensions. Errors are thrown if file terminates too early or does not terminate after genotype matrix is filled. In addition, as each locus is encoded in an integer number of bytes, and each byte contains up to four individuals, bytes with fewer than four are padded. To agree with other software (plink2, BEDMatrix), byte padding values are ignored (may take on any value without causing errors).
This function only supports locus-major BED files, which are the standard for modern data. Format is validated via the BED file's magic numbers (first three bytes of file). Older BED files can be converted using Plink.
Value
The m
-by-n
genotype matrix.
See Also
read_plink()
for reading a set of BED/BIM/FAM files.
geno_to_char()
for translating numerical genotypes into more human-readable character encodings.
Plink BED format reference: https://www.cog-genomics.org/plink/1.9/formats#bed
Examples
# first obtain data dimensions from BIM and FAM files
# all file paths
file_bed <- system.file("extdata", 'sample.bed', package = "genio", mustWork = TRUE)
file_bim <- system.file("extdata", 'sample.bim', package = "genio", mustWork = TRUE)
file_fam <- system.file("extdata", 'sample.fam', package = "genio", mustWork = TRUE)
# read annotation tables
bim <- read_bim(file_bim)
fam <- read_fam(file_fam)
# read an existing Plink *.bim file
# pass locus and individual IDs as vectors, setting data dimensions too
X <- read_bed(file_bed, bim$id, fam$id)
X
# can specify without extension
file_bed <- sub('\\.bed$', '', file_bed) # remove extension from this path on purpose
file_bed # verify .bed is missing
X <- read_bed(file_bed, bim$id, fam$id) # loads too!
X