pgenlibr-package {pgenlibr}R Documentation

PLINK 2 Binary (.pgen) Reader

Description

A thin wrapper over PLINK 2's core libraries which provides an R interface for reading .pgen files. A minimal .pvar loader is also included.

Details

NewPvar and NewPgen initialize the respective readers. Then, you can either iterate through one variant at a time (Read, ReadAlleles) or perform a multi-variant matrix load (ReadIntList, ReadList). When you're done, ClosePgen and ClosePvar free resources.

Author(s)

Christopher Chang chrchang@alumni.caltech.edu

References

Chang, C.C. and Chow, C.C. and Tellier, L.C.A.M. and Vattikuti, S. and Purcell, S.M. and Lee J.J. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4:7. doi:10.1186/s13742-015-0047-8.

Examples

  # This is modified from https://yosuketanigawa.com/posts/2020/09/PLINK2 .
  library(pgenlibr)

  # These files are subsetted from downloads available at
  #   https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg .
  # Note that, after downloading the original files, the .pgen file must be
  # decompressed before use; but both pgenlibr and the PLINK 2 program can
  # handle compressed .pvar files.
  pvar_path <- system.file("extdata", "chr21_phase3_start.pvar.zst", package="pgenlibr")
  pgen_path <- system.file("extdata", "chr21_phase3_start.pgen", package="pgenlibr")

  pvar <- pgenlibr::NewPvar(pvar_path)
  pgen <- pgenlibr::NewPgen(pgen_path, pvar=pvar)

  # Check the number of variants and samples.
  pgenlibr::GetVariantCt(pgen)
  pgenlibr::GetRawSampleCt(pgen)

  # Get the ID of the first variant.
  GetVariantId(pvar, 1)

  # Read the 14th variant.
  buf <- pgenlibr::Buf(pgen)
  pgenlibr::Read(pgen, buf, 14)

  # Get the index of the variant with ID "rs569225703".
  var_id <- pgenlibr::GetVariantsById(pvar, "rs569225703")

  # Get allele count.
  pgenlibr::GetAlleleCt(pvar, var_id)

  # It has three alleles, i.e. two ALT alleles.
  # Read first-ALT-allele dosages for that variant.
  pgenlibr::Read(pgen, buf, var_id)

  # Read second-ALT-allele dosages.
  pgenlibr::Read(pgen, buf, var_id, allele_num=3)

  # Read a matrix with both variants.  Note that, for the multiallelic variant,
  # the dosages of both ALT alleles are summed here.
  geno_mat <- pgenlibr::ReadList(pgen, c(14, var_id))

  pgenlibr::ClosePgen(pgen)
  pgenlibr::ClosePvar(pvar)

[Package pgenlibr version 0.3.7 Index]