genobaypass2pooldata {poolfstat}R Documentation

Convert BayPass read count and haploid pool size input files into a pooldata object

Description

Convert BayPass read count and haploid pool size input files into a pooldata object

Usage

genobaypass2pooldata(
  genobaypass.file = "",
  poolsize.file = "",
  snp.pos = NA,
  poolnames = NA,
  min.cov.per.pool = -1,
  max.cov.per.pool = 1e+06,
  min.maf = -1,
  verbose = TRUE
)

Arguments

genobaypass.file

The name (or a path) of the BayPass read count file (see the BayPass manual https://forgemia.inra.fr/mathieu.gautier/baypass_public/)

poolsize.file

The name (or a path) of the BayPass (haploid) pool size file (see the BayPass manual https://forgemia.inra.fr/mathieu.gautier/baypass_public/)

snp.pos

An optional two column matrix with nsnps rows containing the chromosome (or contig/scaffold) of origin and the position of each markers

poolnames

A character vector with the names of pool

min.cov.per.pool

Minimal allowed read count (per pool). If at least one pool is not covered by at least min.cov.perpool reads, the position is discarded

max.cov.per.pool

Maximal allowed read count (per pool). If at least one pool is covered by more than min.cov.perpool reads, the position is discarded

min.maf

Minimal allowed Minor Allele Frequency (computed from the ratio overall read counts for the reference allele over the read coverage)

verbose

If TRUE extra information is printed on the terminal

Details

Information on SNP position is only required for some graphical display or to carried out block-jacknife sampling estimation of confidence intervals. If no mapping information is given (default), SNPs will be assumed to be ordered on the same chromosome and separated by 1 bp. As blocks are defined with a number of consecutive SNPs (rather than a length), the latter assumption has actually no effect (except in the reported estimated block sizes in Mb).

Value

A pooldata object containing 7 elements:

  1. "refallele.readcount": a matrix with nsnp rows and npools columns containing read counts for the reference allele (chosen arbitrarily) in each pool

  2. "readcoverage": a matrix with nsnp rows and npools columns containing read coverage in each pool

  3. "snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.readcount matrix (3rd column); and the alternative allele (4th column)

  4. "poolsizes": a vector of length npools containing the haploid pool sizes

  5. "poolnames": a vector of length npools containing the names of the pools

  6. "nsnp": a scalar corresponding to the number of SNPs

  7. "npools": a scalar corresponding to the number of pools

Examples

 make.example.files(writing.dir=tempdir())
 pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
 pooldata2genobaypass(pooldata=pooldata,writing.dir=tempdir())
 pooldata=genobaypass2pooldata(genobaypass.file=paste0(tempdir(),"/genobaypass"),
                               poolsize.file=paste0(tempdir(),"/poolsize"))

[Package poolfstat version 2.2.0 Index]