R: Organize information by contigs - for a single data file

prepareFile {poolABC}

R Documentation

Organize information by contigs - for a single data file

Description

Organize the information of a single _rc file into different entries for each contig.

Usage

prepareFile(data, nPops, filter = FALSE, threshold = NA)

Arguments

`data`	is a list with four different entries. The entries should be named as "rMajor", "rMinor", "coverage" and "info". The `rMajor` entry should be a matrix containing the number of observed major-allele reads. The `rMinor` entry should be a matrix containing the number of observed minor-allele reads. The `coverage` entry should be a matrix containing the total depth of coverage. The `info` entry should be a matrix or a data frame containing the remaining relevant information, such as the contig name and the position of each SNP. Each row of these matrices should be a different site and each column should be a different population.
`nPops`	is an integer indicating the total number of different populations in the dataset.
`filter`	is a logical switch, either TRUE or FALSE. If TRUE, then the data is filtered by the frequency of the minor allele and if FALSE, that filter is not applied.
`threshold`	is the minimum allowed frequency for the minor allele. Sites where the allelic frequency is below this threshold are removed from the data.

Details

This function removes all monomorphic sites from the dataset. Monomorphic sites are those where the frequency for all populations is 1 or 0. Then, the name of each contig is used to organize the information in a per contig basis. Thus, each output will be organized by contig. For example, the list with the number of minor-allele reads will contain several entries and each of those entries is a different contig.

If the filter input is set to TRUE, this function also filters the data by the frequency of the minor-allele. If a threshold is supplied, the computed frequency is compared to that threshold and sites where the frequency is below the threshold are removed from the dataset. If no threshold is supplied, the threshold is assumed to be 1/total coverage, meaning that a site should have, at least, one minor-allele read.

Value

a list with six named entries:

`freqs`	a list with the allele frequencies, computed by dividing the number of minor-allele reads by the total coverage. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.
`positions`	a list with the positions of each SNP. Each entry of this list is a vector corresponding to a different contig.
`range`	a list with the minimum and maximum SNP position of each contig. Each entry of this list is a vector corresponding to a different contig.
`rMajor`	a list with the number of major-allele reads. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.
`rMinor`	a list with the number of minor-allele reads. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.
`coverage`	a list with the total coverage. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population.

Examples

# load the data from one rc file
data(rc1)

# clean and organize the data in this single file
mydata <- cleanData(file = rc1, pops = 7:10)

# organize the information by contigs
prepareFile(data = mydata, nPops = 4)

[Package poolABC version 1.0.0 Index]