remove_realReads {poolABC}R Documentation

Remove sites, according to their coverage, from real data

Description

Removes sites that have too many or too few reads from the dataset.

Usage

remove_realReads(nPops, data, minimum, maximum)

Arguments

nPops

is an integer representing the total number of populations in the dataset.

data

is a dataset containing information about real populations. This dataset should have lists with the allelic frequencies, the position of the SNPs, the range of the contig, the number of major allele reads, the number of minor allele reads and the depth of coverage.

minimum

the minimum depth of coverage allowed i.e. sites where the depth of coverage of any population is below this threshold are removed.

maximum

he maximum depth of coverage allowed i.e. sites where the depth of coverage of any population is above this threshold are removed.

Details

The minimum and maximum inputs define, respectively, the minimum and maximum allowed coverage for the dataset. The coverage of each population at each site is compared with those threshold values and any site, where the coverage of at least one population is below or above the user defined threshold, is completely removed from the dataset.

Value

a list with the following elements:

freqs

a list with the allele frequencies, computed by dividing the number of minor-allele reads by the total coverage. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.

positions

a list with the positions of each SNP. Each entry of this list is a vector corresponding to a different contig.

range

a list with the minimum and maximum SNP position of each contig. Each entry of this list is a vector corresponding to a different contig.

rMajor

a list with the number of major-allele reads. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.

rMinor

a list with the number of minor-allele reads. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.

coverage

a list with the total coverage. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.

This output is identical to the data input, the only difference being the removal of sites with too many or too few reads.

Examples

# load the data from one rc file
data(rc1)

# clean and organize the data in this single file
mydata <- cleanData(file = rc1, pops = 7:10)

# organize the information by contigs
mydata <- prepareFile(data = mydata, nPops = 4)

# remove sites with less than 10 reads or more than 180
remove_realReads(nPops = 4, data = mydata, minimum = 10, maximum = 180)


[Package poolABC version 1.0.0 Index]