scanBED {plinkFile}R Documentation

travers variants in a PLINK1 BED fileset

Description

Sequentially visits variants in a PLINK1 BED fileset with a stepping window matrix, and process each window matrix with user scripts either in function or expression form, meant for data to big to fit in the memory.

To read the entire BED into a R matrix, use ⁠[readBED]()⁠ instead.

Usage

scanBED(
  pfx,
  FUN,
  ...,
  win = 1,
  iid = 1,
  vid = 1,
  vfr = NULL,
  vto = NULL,
  buf = 2^24,
  simplify = TRUE
)

loopBED(
  pfx,
  EXP,
  GVR = "g",
  win = 1,
  iid = 1,
  vid = 1,
  vfr = NULL,
  vto = NULL,
  buf = 2^24,
  simplify = TRUE
)

Arguments

pfx

prefix of PLINK BED.

FUN

a function to process each window of variants;

...

additional argument for FUN when scanBED is used.

win

reading window size (def=100 variants per window)

iid

option to read N IID as row names (def=1, see readIID()).

vid

option to read P VID as col names (def=1, see readVID()).

vfr

variant-wise, from where to read (number/proportion, def=1)?

vto

varinat-wise, to where then stop (number/proportion, def=P)?

buf

buffer size in byptes (def=2^24, or 16 MB).

simplify

try simplifying the results into an array, or leave them in a list, or specify a function to simplify the said list.

EXP

a R expression to evaluate with each window of variants;

GVR

a R variable name to assign the window to (def="g").

Value

results of all windows processed by the user script.

Functions

BED PLINK1 Binary Pedigree fileset

A popular format to store biallelic dosage genotype, with three files,

The triplets are commonly referred by the shared prefix (pfx), e.g., the X chromosome represented by "chrX.bed", "chrX.fam", and "chrX.bim" are refered by "chrX".

The binary file "pfx.bed" represent each dosage value with two bits - just enough to encode all four possiblities: 0, 1, or 2 alleles, or missing.

The number of variants (P) and samples (N) equals to the number of lines in text file "pfx.bim" and "pfx.fam", respectively.

For the detailed specification of PLINK1 BED genotype format, see the lagecy PLINK v1.07 page at: \ https://zzz.bwh.harvard.edu/plink/binary.shtml. \ For the modern use and management of PLINK1 BED, see the PLINK v1.9 page: \ https://www.cog-genomics.org/plink/1.9/input#bed.

detailed arguments

genotype context

context infomation such the number of variants and samples are updated in the window processing environment to ease user scripting, which includes:

e.g. (1) print percentage progress with print(.w / .W * 100); \ e.g. (2) use inf <- readBIM(pfx) to read the table of variants before the window visits, later use inf[.i, ] to access meta-data for variants in each window.

See Also

⁠[readBED]⁠

Examples

## traverse genotype, apply R function without side effects
pfx <- file.path(system.file("extdata", package="plinkFile"), "000")
ret <- scanBED(pfx, function(g)
{
    .af <- colMeans(g, na.rm=TRUE) / 2
    maf <- pmin(.af, 1 - .af)
    mis <- colSums(is.na(g)) / .N
    pct <- round(.w / .W * 100, 2)
    cbind(buf=.b, wnd=.w, idx=.i, MAF=maf, MIS=mis, PCT=pct)
},
vfr=NULL, vto=NULL, win=13, simplify=rbind, buf=2^18)
head(ret)
tail(ret)

## traversing genotype, evaluate R expression with side effects
pfx <- file.path(system.file("extdata", package="plinkFile"), "000.bed")
ret <- list() # use side effect to keep the result of each window.
loopBED(pfx,
{
    af <- colMeans(gt, na.rm=TRUE) / 2
    sg <- af * (1 - af)
    ret[[.w]] <- cbind(wnd=.w, alf=af, var=sg)
},
win=13, GVR="gt", vid=3, buf=2^18)
head(ret)
tail(ret)


[Package plinkFile version 0.2.1 Index]