genotreemix2countdata {poolfstat}R Documentation

Convert allele count input files from the Treemix program into a coundata object

Description

Convert allele count input files from the Treemix program into a coundata object

Usage

genotreemix2countdata(
  genotreemix.file = "",
  snp.pos = NA,
  min.indgeno.per.pop = -1,
  min.maf = -1,
  verbose = TRUE
)

Arguments

genotreemix.file

The name (or a path) of the Treemix allele count file (see the Treemix manual https://bitbucket.org/nygcresearch/treemix/wiki/Home)

snp.pos

An optional two column matrix with nsnps rows containing the chromosome (or contig/scaffold) of origin and the position of each markers

min.indgeno.per.pop

Minimal number of overall counts required in each population. If at least one pop is not genotyped for at least min.indgeno.per.pop (haploid) individual, the position is discarded

min.maf

Minimal allowed Minor Allele Frequency (computed from the ratio overall counts for the reference allele over the overall number of (haploid) individual genotyped)

verbose

If TRUE extra information is printed on the terminal

Details

Information on SNP position is only required for some graphical display or to carried out block-jacknife sampling estimation of confidence intervals. If no mapping information is given (default), SNPs will be assumed to be ordered on the same chromosome and separated by 1 bp. As blocks are defined with a number of consecutive SNPs (rather than a length), the latter assumption has actually no effect (except in the reported estimated block sizes in Mb).

Value

A countdata object containing 6 elements:

  1. "refallele.count": a matrix (nsnp rows and npops columns) with the allele counts for the reference allele

  2. "total.count": a matrix (nsnp rows and npops columns) with the total number of counts (i.e., twice the number of genotyped individual for diploid species and autosomal markers)

  3. "snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.count matrix (3rd column); and the alternative allele (4th column)

  4. "popnames": a vector of length npops containing the names of the pops

  5. "nsnp": a scalar corresponding to the number of SNPs

  6. "npops": a scalar corresponding to the number of populations

Examples

 make.example.files(writing.dir=tempdir())
 pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
 ##NOTE: This example is just for the sake of illustration as it amounts 
 ##to interpret read count as allele count which must not be done in practice!
 dum=matrix(paste(pooldata@refallele.readcount,
   pooldata@readcoverage-pooldata@refallele.readcount,sep=","),
   ncol=pooldata@npools)
 colnames(dum)=pooldata@poolnames
 write.table(dum,file=paste0(tempdir(),"/genotreemix"),quote=FALSE,row.names=FALSE)
 countdata=genotreemix2countdata(genotreemix.file=paste0(tempdir(),"/genotreemix"))

[Package poolfstat version 2.2.0 Index]