R: Convert allele count input files from the Treemix program...

genotreemix2countdata {poolfstat}

R Documentation

Convert allele count input files from the Treemix program into a coundata object

Description

Convert allele count input files from the Treemix program into a coundata object

Usage

genotreemix2countdata(
  genotreemix.file = "",
  snp.pos = NA,
  min.indgeno.per.pop = -1,
  min.maf = -1,
  verbose = TRUE
)

Arguments

`genotreemix.file`	The name (or a path) of the Treemix allele count file (see the Treemix manual https://bitbucket.org/nygcresearch/treemix/wiki/Home)
`snp.pos`	An optional two column matrix with nsnps rows containing the chromosome (or contig/scaffold) of origin and the position of each markers
`min.indgeno.per.pop`	Minimal number of overall counts required in each population. If at least one pop is not genotyped for at least min.indgeno.per.pop (haploid) individual, the position is discarded
`min.maf`	Minimal allowed Minor Allele Frequency (computed from the ratio overall counts for the reference allele over the overall number of (haploid) individual genotyped)
`verbose`	If TRUE extra information is printed on the terminal

Details

Information on SNP position is only required for some graphical display or to carried out block-jacknife sampling estimation of confidence intervals. If no mapping information is given (default), SNPs will be assumed to be ordered on the same chromosome and separated by 1 bp. As blocks are defined with a number of consecutive SNPs (rather than a length), the latter assumption has actually no effect (except in the reported estimated block sizes in Mb).

Value

A countdata object containing 6 elements:

"refallele.count": a matrix (nsnp rows and npops columns) with the allele counts for the reference allele
"total.count": a matrix (nsnp rows and npops columns) with the total number of counts (i.e., twice the number of genotyped individual for diploid species and autosomal markers)
"snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.count matrix (3rd column); and the alternative allele (4th column)
"popnames": a vector of length npops containing the names of the pops
"nsnp": a scalar corresponding to the number of SNPs
"npops": a scalar corresponding to the number of populations

Examples

 make.example.files(writing.dir=tempdir())
 pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
 ##NOTE: This example is just for the sake of illustration as it amounts 
 ##to interpret read count as allele count which must not be done in practice!
 dum=matrix(paste(pooldata@refallele.readcount,
   pooldata@readcoverage-pooldata@refallele.readcount,sep=","),
   ncol=pooldata@npools)
 colnames(dum)=pooldata@poolnames
 write.table(dum,file=paste0(tempdir(),"/genotreemix"),quote=FALSE,row.names=FALSE)
 countdata=genotreemix2countdata(genotreemix.file=paste0(tempdir(),"/genotreemix"))

[Package poolfstat version 2.2.0 Index]