genotreemix2countdata {poolfstat} | R Documentation |
Convert allele count input files from the Treemix program into a coundata object
Description
Convert allele count input files from the Treemix program into a coundata object
Usage
genotreemix2countdata(
genotreemix.file = "",
snp.pos = NA,
min.indgeno.per.pop = -1,
min.maf = -1,
verbose = TRUE
)
Arguments
genotreemix.file |
The name (or a path) of the Treemix allele count file (see the Treemix manual https://bitbucket.org/nygcresearch/treemix/wiki/Home) |
snp.pos |
An optional two column matrix with nsnps rows containing the chromosome (or contig/scaffold) of origin and the position of each markers |
min.indgeno.per.pop |
Minimal number of overall counts required in each population. If at least one pop is not genotyped for at least min.indgeno.per.pop (haploid) individual, the position is discarded |
min.maf |
Minimal allowed Minor Allele Frequency (computed from the ratio overall counts for the reference allele over the overall number of (haploid) individual genotyped) |
verbose |
If TRUE extra information is printed on the terminal |
Details
Information on SNP position is only required for some graphical display or to carried out block-jacknife sampling estimation of confidence intervals. If no mapping information is given (default), SNPs will be assumed to be ordered on the same chromosome and separated by 1 bp. As blocks are defined with a number of consecutive SNPs (rather than a length), the latter assumption has actually no effect (except in the reported estimated block sizes in Mb).
Value
A countdata object containing 6 elements:
"refallele.count": a matrix (nsnp rows and npops columns) with the allele counts for the reference allele
"total.count": a matrix (nsnp rows and npops columns) with the total number of counts (i.e., twice the number of genotyped individual for diploid species and autosomal markers)
"snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.count matrix (3rd column); and the alternative allele (4th column)
"popnames": a vector of length npops containing the names of the pops
"nsnp": a scalar corresponding to the number of SNPs
"npops": a scalar corresponding to the number of populations
Examples
make.example.files(writing.dir=tempdir())
pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
##NOTE: This example is just for the sake of illustration as it amounts
##to interpret read count as allele count which must not be done in practice!
dum=matrix(paste(pooldata@refallele.readcount,
pooldata@readcoverage-pooldata@refallele.readcount,sep=","),
ncol=pooldata@npools)
colnames(dum)=pooldata@poolnames
write.table(dum,file=paste0(tempdir(),"/genotreemix"),quote=FALSE,row.names=FALSE)
countdata=genotreemix2countdata(genotreemix.file=paste0(tempdir(),"/genotreemix"))