merge_datasets {mappoly} | R Documentation |
Merge datasets
Description
This function merges two datasets of class mappoly.data
. This can be useful
when individuals of a population were genotyped using two or more techniques
and have datasets in different files or formats. Please notice that the datasets
should contain the same number of individuals and they must be represented identically
in both datasets (e.g. Ind_1
in both datasets, not Ind_1
in one dataset and ind_1
or Ind.1
in the other).
Usage
merge_datasets(dat.1 = NULL, dat.2 = NULL)
Arguments
dat.1 |
the first dataset of class |
dat.2 |
the second dataset of class |
Value
An object of class mappoly.data
which contains all markers
from both datasets. It will be a list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
if one or both datasets originated from read_vcf, it keeps reference alleles from sequencing platform, otherwise is NULL |
seq.alt |
if one or both datasets originated from read_vcf, it keeps alternative alleles from sequencing platform, otherwise is NULL |
all.mrk.depth |
if one or both datasets originated from read_vcf, it keeps marker read depths from sequencing, otherwise is NULL |
prob.thres |
(unused field) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
if both datasets contain genotype distribution information, the final object will contain 'geno'. This is set to NULL otherwise |
nphen |
(0) |
phen |
(NULL) |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers in both datasets |
kept |
if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## Loading a subset of SNPs from chromosomes 3 and 12 of sweetpotato dataset
## (SNPs anchored to Ipomoea trifida genome)
dat <- NULL
for(i in c(3, 12)){
cat("Loading chromosome", i, "...\n")
tempfl <- tempfile(pattern = paste0("ch", i), fileext = ".vcf.gz")
x <- "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch"
address <- paste0(x, i, ".vcf.gz")
download.file(url = address, destfile = tempfl)
dattemp <- read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2",
ploidy = 6, verbose = FALSE)
dat <- merge_datasets(dat, dattemp)
cat("\n")
}
dat
plot(dat)