prepareData {poolABC} | R Documentation |
Organize information by contig - for multiple data files
Description
Organize the information of multiple _rc files into different entries for each contig.
Usage
prepareData(data, nPops, filter = FALSE, threshold = NA)
Arguments
data |
is a list with four different entries. The entries should be
named as "rMajor", "rMinor", "coverage" and "info". The |
nPops |
is an integer indicating the total number of different populations in the dataset. |
filter |
is a logical switch, either TRUE or FALSE. If TRUE, then the data is filtered by the frequency of the minor allele and if FALSE, that filter is not applied. |
threshold |
is the minimum allowed frequency for the minor allele. Sites where the allelic frequency is below this threshold are removed from the data. |
Details
This function removes all monomorphic sites from the dataset. Monomorphic sites are those where the frequency for all populations is 1 or 0. Then, the name of each contig is used to organize the information in a per contig basis. Thus, each output will be organized by contig. For example, the list with the number of minor-allele reads will contain several entries and each of those entries is a different contig.
If the filter input is set to TRUE, this function also filters the data by
the frequency of the minor-allele. If a threshold is supplied, the computed
frequency is compared to that threshold and sites where the frequency is
below the threshold are removed from the dataset. If no threshold is
supplied, the threshold is assumed to be 1/total coverage
, meaning
that a site should have, at least, one minor-allele read.
Value
a list with six named entries:
freqs |
a list with the allele frequencies, computed by dividing the number of minor-allele reads by the total coverage. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population. |
positions |
a list with the positions of each SNP. Each entry of this list is a vector corresponding to a different contig. |
range |
a list with the minimum and maximum SNP position of each contig. Each entry of this list is a vector corresponding to a different contig. |
rMajor |
a list with the number of major-allele reads. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population. |
rMinor |
a list with the number of minor-allele reads. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population. |
coverage |
a list with the total coverage. Each entry of this list corresponds to a different contig. Each entry is a matrix where each row is a different site and each column is a different population. |
Examples
# load the data from two rc files
data(rc1, rc2)
# combine both files into a single list
mydata <- list(rc1, rc2)
# clean and organize the data for both files
mydata <- lapply(mydata, function(i) cleanData(file = i, pops = 7:10))
# organize the information by contigs
prepareData(data = mydata, nPops = 4)