snp_readBGEN {bigsnpr} | R Documentation |
Read BGEN files into a "bigSNP"
Description
Function to read the UK Biobank BGEN files into a bigSNP.
Usage
snp_readBGEN(
bgenfiles,
backingfile,
list_snp_id,
ind_row = NULL,
bgi_dir = dirname(bgenfiles),
read_as = c("dosage", "random"),
ncores = 1
)
Arguments
bgenfiles |
Character vector of paths to files with extension ".bgen". The corresponding ".bgen.bgi" index files must exist. |
backingfile |
The path (without extension) for the backing files (".bk" and ".rds") that are created by this function for storing the bigSNP object. |
list_snp_id |
List (same length as the number of BGEN files) of
character vector of SNP IDs to read. These should be in the form
|
ind_row |
An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used. Don't use negative indices.
You can access the sample IDs corresponding to the genotypes from the .sample
file, and use e.g. |
bgi_dir |
Directory of index files. Default is the same as |
read_as |
How to read BGEN probabilities? Currently implemented:
|
ncores |
Number of cores used. Default doesn't use parallelism.
You may use |
Details
For more information on this format, please visit BGEN webpage.
This function is designed to read UK Biobank imputation files. This assumes
that variants have been compressed with zlib, that there are only 2 possible
alleles, and that each probability is stored on 8 bits. For example, if you
use qctool to generate your own BGEN files, please make sure you are using
options '-ofiletype bgen_v1.2 -bgen-bits 8
'.
If the format is not the expected one, this will result in an error or even
a crash of your R session. Another common source of error is due to corrupted
files; e.g. if using UK Biobank files, compare the result of tools::md5sum()
with the ones at https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=998.
You can look at some example code from my papers on how to use this function:
-
https://github.com/privefl/paper-misspec/blob/main/code/prepare-genotypes.R
-
https://github.com/privefl/paper-ldpred2/blob/master/code/prepare-genotypes.R#L1-L62
-
https://github.com/privefl/paper4-bedpca/blob/master/code/missing-values-UKBB.R#L34-L75
Value
The path to the RDS file <backingfile>.rds
that stores the bigSNP
object created by this function. Note that this function creates another
file (.bk) which stores the values of the FBM ($genotypes
). The $map
component of the bigSNP
object stores some information on the variants
(including allele frequencies and INFO scores computed from the probabilities).
However, it does not have a $fam
component; you should use the individual
IDs in the .sample file (filtered with ind_row
) to add external information
on the individuals.
You shouldn't read from BGEN files more than once. Instead, use
snp_attach to load the "bigSNP" object in any R session from backing files.