countdata.subset {poolfstat} | R Documentation |
Create a subset of a countdata object that contains count data as a function of pop or SNP indexes
Description
Create a subset of a countdata object that contains count data as a function of pop or SNP indexes
Usage
countdata.subset(
countdata,
pop.index = 1:countdata@npops,
snp.index = 1:countdata@nsnp,
min.indgeno.per.pop = -1,
min.maf = -1,
return.snp.idx = FALSE,
verbose = TRUE
)
Arguments
countdata |
A countdata object containing Allele count information |
pop.index |
Indexes of the pools (at least two), that should be selected to create the new pooldata object (default=all the pools) |
snp.index |
Indexes of the SNPs (at least two), that should be selected to create the new pooldata object (default=all the SNPs) |
min.indgeno.per.pop |
Minimal number of overall counts required in each population. If at least one pop is not genotyped for at least min.indgeno.per.pop (haploid) individual, the position is discarded |
min.maf |
Minimal allowed Minor Allele Frequency (computed from the ratio overall counts for the reference allele over the overall number of (haploid) individual genotyped) |
return.snp.idx |
If TRUE, the row.names of the snp.info slot of the returned pooldata object are named as "rsx" where x is the index of SNP in the initial pooldata object (default=FALSE) |
verbose |
If TRUE return some information |
Details
This function allows subsetting a pooldata object by selecting only some pools and/or some SNPs (e.g., based on their position on the genome). Additional filtering steps on SNPs can be carried out on the resulting subset to discard SNP with low polymorphism or poorly or too highly covered. In addition, coverage criteria can be applied on a per-pool basis with the cov.qthres.per.pool argument. 'more specific SNP selection based on their positions on the genome or their characteristics. For instance if qmax=0.95, a position is discarded if in a given pool it has a number of reads higher than the 95-th percentile of the empirical coverage distribution in this same pool (defined over the SNPs selected by snp.index). Similarly, if qmax=0.05, a position is discarded if in a given pool it has a number of reads lower than the 5-th percentile of the empirical coverage distribution in this same pool. This mode of selection may be more relevant when considering pools with heterogeneous read coverages.
Value
A countdata object with 6 elements:
"refallele.count": a matrix (nsnp rows and npops columns) with the allele counts for the reference allele
"total.count": a matrix (nsnp rows and npops columns) with the total number of counts (i.e., twice the number of genotyped individual for diploid species and autosomal markers)
"snp.info": a matrix with nsnp rows and four columns containing respectively the contig (or chromosome) name (1st column) and position (2nd column) of the SNP; the allele taken as reference in the refallele.count matrix (3rd column); and the alternative allele (4th column)
"popnames": a vector of length npops containing the names of the pops
"nsnp": a scalar corresponding to the number of SNPs
"npops": a scalar corresponding to the number of populations
See Also
To generate countdata object, see genobaypass2countdata
, genotreemix2countdata
Examples
make.example.files(writing.dir=tempdir())
pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
pooldata2genobaypass(pooldata=pooldata,writing.dir=tempdir())
##NOTE: This example is just for the sake of illustration as it amounts to
##interpret read count as allele count which must not be done in practice!
countdata=genobaypass2countdata(genobaypass.file=paste0(tempdir(),"/genobaypass"))
subset.by.snps=countdata.subset(countdata,snp.index=10:100)
subset.by.pops.and.snps=countdata.subset(countdata,pop.index=c(1,2),snp.index=10:100)