haplofreq {optiSel} | R Documentation |
Evaluates the Occurrence of Haplotype Segments in Particular Breeds
Description
For each haplotype from thisBreed
and every SNP the occurence of the haplotype segment containing the SNP in a set of reference breeds is evaluated. The maximum frequency each segment has in one of these reference breeds is computed, and the breed in which the segment has maximum frequency is identified. Results are either returned in a list or saved to files.
Usage
haplofreq(files, phen, map, thisBreed, refBreeds="others", minSNP=20, minL=1.0,
unitL="Mb", ubFreq=0.01, keep=NULL, skip=NA, cskip=NA, w.dir=NA,
what=c("freq", "match"), cores=1, quiet=FALSE)
Arguments
files |
Either a character vector with file names, or a list containing character vectors with file names. The files contain phased genotypes, one file for each chromosome. File names must contain the chromosome name as specified in the If
|
phen |
Data frame containing the ID (column |
map |
Data frame providing the marker map with columns including marker name |
thisBreed |
Name of a breed from column |
refBreeds |
Vector with names of breeds from column |
minSNP |
Minimum number of marker SNPs included in a segment. |
minL |
Minimum length of a segment in |
unitL |
The unit for measuring the length of a segment. Possible units are the number of marker SNPs included in the segment ( |
ubFreq |
If a haplotype segment has frequency smaller than |
keep |
Subset of the IDs of the individuals from data frame |
skip |
Take line |
cskip |
Take column |
w.dir |
Output file directory. Writing results to files has the advantage that much less working memory is required. By default, no files are created. The function returns only the file names if files are created. |
what |
For |
cores |
Number of cores to be used for parallel processing of chromosomes. By default one core is used. For |
quiet |
Should console output be suppressed? |
Details
For each haplotype from thisBreed
and every SNP the occurence of the haplotype segment containing the SNP in a set of reference breeds is evaluated. The maximum frequency each segment has in one of these reference breeds is computed, and the breed in which the segment has maximum frequency is identified. Results are either returned in a list or saved to files.
Marker file format: Each marker file containing phased genotypes has a header and no row names. Cells are separated by blank spaces. The number of rows is equal to the number of markers from the respective chromosome and the markers are in the same order as in the map
. The first cskip
columns are ignored. The remaining columns contain genotypes of individuals written as two alleles separated by a character, e.g. A/B, 0/1, A|B, A B, or 0 1. The same two symbols must be used for all markers. Column names are the IDs of the individuals. If the blank space is used as separator then the ID of each individual should repeated in the header to get a regular delimited file. The columns to be skipped and the individual IDs must have no white spaces.
Value
If w.dir=NA
then a list is returned. The list may have the following components:
freq |
Mx(2N) - matrix containing for every SNP and for each of the 2N haplotypes from |
match |
Mx(2N) - matrix containing for every SNP and for each of the 2N haplotypes from |
The list has attributes thisBreed
, and map
.
If w.dir
is the name of a directory, then results are written to files, whereby each file corresponds to one chromosome, and a data frame with file names is returned.
Author(s)
Robin Wellmann
Examples
data(map)
data(Cattle)
dir <- system.file("extdata", package="optiSel")
files <- file.path(dir, paste("Chr", 1:2, ".phased", sep=""))
Freq <- freqlist(
haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt", minL=2.0),
haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Holstein", minL=2.0),
haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Fleckvieh", minL=2.0)
)
plot(Freq, ID=1, hap=2, refBreed="Rotbunt")
plot(Freq, ID=1, hap=2, refBreed="Holstein", Chr=1)
## Test for using multiple cores:
Freq1 <- haplofreq(files, Cattle, map, thisBreed="Angler", refBreeds="Rotbunt",
minL=2.0, cores=NA)$freq
range(Freq[[1]]-Freq1)
#[1] 0 0
## Creating output files with allele frequencies and allele origins:
rdir <- system.file("extdata", package = "optiSel")
wdir <- file.path(tempdir(), "HaplotypeEval")
chr <- unique(map$Chr)
files <- file.path(rdir, paste("Chr", chr, ".phased", sep=""))
wfile <- haplofreq(files, Cattle, map, thisBreed="Angler", minL=2.0, w.dir=wdir)
View(read.table(wfile$match[1],skip=1))
#unlink(wdir, recursive = TRUE)