estimate_IBD {polyqtlR} | R Documentation |
Generate IBD probabilities from marker genotypes and a phased linkage map
Description
estimate_IBD
is a function for creating identity-by-descent (IBD) probabilities. Two computational methods are offered:
by default IBD probabilites are estimated using hidden Markov models, but a heuristic method based on Bourke et al. (2014) is also included.
Basic input data for this function are marker genotypes (either discrete marker dosages (ie scores 0, 1, ..., ploidy representing the number of copies of the marker allele),
or the probabilities of these dosages) and a phased linkage map. Details on each of the methods are included under method
Usage
estimate_IBD(
input_type = "discrete",
genotypes,
phased_maplist,
method = "hmm",
remove_markers = NULL,
ploidy,
ploidy2 = NULL,
parent1 = "P1",
parent2 = "P2",
individuals = "all",
log = NULL,
map_function = "haldane",
bivalent_decoding = TRUE,
error = 0.01,
full_multivalent_hexa = FALSE,
verbose = FALSE,
ncores = 1,
fix_threshold = 0.1,
factor_dist = 1
)
Arguments
input_type |
Can be either one of 'discrete' or 'probabilistic'. For the former (default), |
genotypes |
Marker genotypes, either a 2d matrix of integer marker scores or a data.frame of dosage probabilities. Details are as follows:
|
phased_maplist |
A list of phased linkage maps, the output of |
method |
The method used to estimate IBD probabilities, either |
remove_markers |
Optional vector of marker names to remove from the maps. Default is |
ploidy |
Integer. Ploidy of the organism. |
ploidy2 |
Optional integer, by default |
parent1 |
Identifier of parent 1, by default assumed to be |
parent2 |
Identifier of parent 2, by default assumed to be |
individuals |
By default "all" offspring are included, but otherwise a subset can be selected, using a vector of offspring indexing numbers (1,2, etc.)
according to their order in |
log |
Character string specifying the log filename to which standard output should be written. If |
map_function |
Mapping function to use when converting map distances to recombination frequencies.
Currently only |
bivalent_decoding |
Option to consider only bivalent pairing during formation of gametes (ignored for diploid populations, as only bivalents possible there), by default |
error |
The (prior) probability of errors in the offspring dosages, usually assumed to be small but non-zero |
full_multivalent_hexa |
Option to allow multivalent pairing in both parents at the hexaploid level, by default |
verbose |
Logical, by default |
ncores |
How many CPU cores should be used in the evaluation? By default 1 core is used. |
fix_threshold |
If |
factor_dist |
If |
Value
A list of IBD probabilities, organised by linkage group (as given in the input phased_maplist
). Each
list item is itself a list containing the following:
- IBDtype
The type of IBD; for this function only "genotypeIBD" are calculated.
- IBDarray
A 3d array of IBD probabilities, with dimensions marker, genotype-class and F1 individual.
- map
A 3-column data-frame specifying chromosome, marker and position (in cM)
- parental_phase
Phasing of the markers in the parents, as given in the input
phased_maplist
- marginal.likelihoods
A list of marginal likelihoods of different valencies if method "hmm" was used, otherwise
NULL
- valency
The predicted valency that maximised the marginal likelihood, per offspring. For method "heur",
NULL
- offspring
Offspring names
- biv_dec
Logical, whether bivalent decoding was used in the estimation of the F1 IBD probabilities.
- gap
The size of the gap (in cM) used when interpolating the IBD probabilities. See function
spline_IBD
for details.- genocodes
Ordered list of genotype codes used to represent different genotype classes.
- pairing
log likelihoods of each of the different pairing scenarios considered (can be used e.g. for post-mapping check of preferential pairing)
- ploidy
ploidy of parent 1
- ploidy2
ploidy of parent 2
- method
The method used, either "hmm" (default) or "heur". See argument
method
- error
The error prior used, if method "hmm" was used, otherwise
NULL
References
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.
Hackett et al. (2013) Linkage analysis and QTL mapping using SNP dosage data in a tetraploid potato mapping population. PLoS One 8(5): e63939
Zheng et al. (2016) Probabilistic multilocus haplotype reconstruction in outcrossing tetraploids. Genetics 203: 119-131
Bourke P.M. (2014) QTL analysis in polyploids: Model testing and power calculations. Wageningen University (MSc thesis)
Examples
data("phased_maplist.4x", "SNP_dosages.4x")
estimate_IBD(phased_maplist=phased_maplist.4x,genotypes=SNP_dosages.4x,ploidy=4)