| scanBED {plinkFile} | R Documentation | 
travers variants in a PLINK1 BED fileset
Description
Sequentially visits variants in a PLINK1 BED fileset with a stepping window matrix, and process each window matrix with user scripts either in function or expression form, meant for data to big to fit in the memory.
To read the entire BED into a R matrix, use [readBED]() instead.
Usage
scanBED(
  pfx,
  FUN,
  ...,
  win = 1,
  iid = 1,
  vid = 1,
  vfr = NULL,
  vto = NULL,
  buf = 2^24,
  simplify = TRUE
)
loopBED(
  pfx,
  EXP,
  GVR = "g",
  win = 1,
  iid = 1,
  vid = 1,
  vfr = NULL,
  vto = NULL,
  buf = 2^24,
  simplify = TRUE
)
Arguments
| pfx | prefix of PLINK BED. | 
| FUN | a function to process each window of variants; | 
| ... | additional argument for  | 
| win | reading window size (def=100 variants per window) | 
| iid | option to read  | 
| vid | option to read  | 
| vfr | variant-wise, from where to read (number/proportion, def=1)? | 
| vto | varinat-wise, to where then stop (number/proportion, def=P)? | 
| buf | buffer size in byptes (def=2^24, or 16 MB). | 
| simplify | try simplifying the results into an array, or leave them in a list, or specify a function to simplify the said list. | 
| EXP | a R expression to evaluate with each window of variants; | 
| GVR | a R variable name to assign the window to (def="g"). | 
Value
results of all windows processed by the user script.
Functions
-  scanBED(): apply a function to variants in a PLINK1 BED filesetTravers Pvariants via a sliding window while calling a function on each window of variants without side effects on the calling environment, mimicking various Rapplyutilities.
-  loopBED(): evaluate an expression on variants in a PLINK1 BEDTravers Pvariants via a sliding window and evaluate an R expression given each window of variants, with side effects on the calling environment, mimicking the syntax of Rforloop.
BED PLINK1 Binary Pedigree fileset
A popular format to store biallelic dosage genotype, with three files,
-  pfx.fam: text table for Nindividuals, detailed in readFAM;
-  pfx.bim: text table for Pvariants, detailed in readBIM;
-  pfx.bed: transposed genotype matrix ( PxN) in binary format.
The triplets are commonly referred by the shared prefix (pfx), e.g., the X
chromosome represented by "chrX.bed", "chrX.fam", and "chrX.bim" are refered
by "chrX".
The binary file "pfx.bed" represent each dosage value with two bits - just enough to encode all four possiblities: 0, 1, or 2 alleles, or missing.
The number of variants (P) and samples (N) equals to the number of lines
in text file "pfx.bim" and "pfx.fam", respectively.
For the detailed specification of PLINK1 BED genotype format, see the lagecy PLINK v1.07 page at: \ https://zzz.bwh.harvard.edu/plink/binary.shtml. \ For the modern use and management of PLINK1 BED, see the PLINK v1.9 page: \ https://www.cog-genomics.org/plink/1.9/input#bed.
detailed arguments
-  win: visiting window size.the number of variants per window, that is, the number of columns in each window matrix passed to the user script. For example, a size one window means the user script will be dealing with only one variant at a time, received from in a matrix of a single column – a manner similar to genome wide association analysis (GWAS). However, a larger, multi-variant window coupled with R language's vector and matrix syntax can significantly boost efficiency. The default size is 1000 variants / columns per window. 
-  buf: buffer size in bytesa large buffer reduces the frequency of hard disk visits when traversing a PLINK1 BED file, which in turn reduces non-computation overhead. The default size is 2^24bytes, or 16 MB.
-  simplify:when FALSE: resuts of user script processing each window of variants are returned in a list; when TRUE, use simplify2arrayto put the results into an array, if it fails, fallback and return a list.when a function is specified, it is then used to simplify the results, if an execption is thrown, fallback and return a list. e.g., the window script returns a data frame of estimate, standard error, t-statistic, and p-value for each variant, simplify = rbindto combine results of all windows into one data frame ofProws and four columns of statistics.
genotype context
context infomation such the number of variants and samples are updated in the window processing environment to ease user scripting, which includes:
-  .i: indies of variants in the current visiting window;
-  .p: number of variants in the current visiting window.
-  .P: total number of variants;
-  .w: index of the current window;
-  .W: total number of windows to go through;
-  .N: number of individuals.
-  .b: index of the current buffer.
-  .B: number of buffers to be swapped.
e.g. (1) print percentage progress with print(.w / .W * 100); \
e.g. (2) use inf <- readBIM(pfx) to  read the table of variants before the
window visits,  later use inf[.i,  ] to  access meta-data for  variants in
each window.
See Also
[readBED]
Examples
## traverse genotype, apply R function without side effects
pfx <- file.path(system.file("extdata", package="plinkFile"), "000")
ret <- scanBED(pfx, function(g)
{
    .af <- colMeans(g, na.rm=TRUE) / 2
    maf <- pmin(.af, 1 - .af)
    mis <- colSums(is.na(g)) / .N
    pct <- round(.w / .W * 100, 2)
    cbind(buf=.b, wnd=.w, idx=.i, MAF=maf, MIS=mis, PCT=pct)
},
vfr=NULL, vto=NULL, win=13, simplify=rbind, buf=2^18)
head(ret)
tail(ret)
## traversing genotype, evaluate R expression with side effects
pfx <- file.path(system.file("extdata", package="plinkFile"), "000.bed")
ret <- list() # use side effect to keep the result of each window.
loopBED(pfx,
{
    af <- colMeans(gt, na.rm=TRUE) / 2
    sg <- af * (1 - af)
    ret[[.w]] <- cbind(wnd=.w, alf=af, var=sg)
},
win=13, GVR="gt", vid=3, buf=2^18)
head(ret)
tail(ret)