read_geno_prob {mappoly} | R Documentation |
Data Input
Description
Reads an external data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
Usage
read_geno_prob(
file.in,
prob.thres = 0.95,
filter.non.conforming = TRUE,
elim.redundant = TRUE,
verbose = TRUE
)
Arguments
file.in |
a character string with the name of (or full path to) the input file which contains the data to be read |
prob.thres |
probability threshold to associate a marker call to a
dosage. Markers with maximum genotype probability smaller than |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
Details
The first line of the input file contains the string ploidy
followed by the ploidy level of the parents.
The second and third lines contains the strings n.ind
and n.mrk
followed by the number of individuals in
the dataset and the total number of markers, respectively. Lines number 4 and 5 contain the string
mrk.names
and ind.names
followed by a sequence of the names of the markers and the name of the individuals,
respectively. Lines 6 and 7 contain the strings dosageP
and dosageQ
followed by a sequence of numbers
containing the dosage of all markers in parent P
and Q
. Line 8, contains the string seq followed by
a sequence of integer numbers indicating the chromosome each marker belongs. It can be any 'a priori'
information regarding the physical distance between markers. For example, these numbers could refer
to chromosomes, scaffolds or even contigs, in which the markers are positioned. If this information
is not available for a particular marker, NA should be used. If this information is not available for
any of the markers, the string seq
should be followed by a single NA
. Line number 9 contains the string
seqpos
followed by the physical position of the markers into the sequence. The physical position can be
given in any unity of physical genomic distance (base pairs, for instance). However, the user should be
able to make decisions based on these values, such as the occurrence of crossing overs, etc. Line number 10
should contain the string nphen
followed by the number of phenotypic traits. Line number 11 is skipped
(Usually used as a spacer). The next elements are strings containing the name of the phenotypic trait with no space characters
followed by the phenotypic values. The number of lines should be the same number of phenotypic traits.
NA
represents missing values. The line number 12 + nphen
is skipped. Finally, the last element is a table
containing the probability distribution for each combination of marker and offspring. The first two columns
represent the marker and the offspring, respectively. The remaining elements represent the probability
associated with each one of the possible dosages. NA
represents missing data.
Value
an object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' were considered as missing data in the 'geno.dose' matrix |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a data.frame
containing the probability distribution for each combination of
marker and offspring. The first two columns represent the marker
and the offspring, respectively. The remaining elements represent
the probability associated to each one of the possible
dosages. Missing data are converted from NA to the expected
segregation ratio using function |
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
#### Tetraploid Example
ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/hexa_sample"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
SolCAP.dose.prob <- read_geno_prob(file.in = tempfl)
print(SolCAP.dose.prob, detailed = TRUE)
plot(SolCAP.dose.prob)