R: Merge datasets

merge_datasets {mappoly}

R Documentation

Merge datasets

Description

This function merges two datasets of class mappoly.data. This can be useful when individuals of a population were genotyped using two or more techniques and have datasets in different files or formats. Please notice that the datasets should contain the same number of individuals and they must be represented identically in both datasets (e.g. Ind_1 in both datasets, not Ind_1 in one dataset and ind_1 or Ind.1 in the other).

Usage

merge_datasets(dat.1 = NULL, dat.2 = NULL)

Arguments

`dat.1`	the first dataset of class `mappoly.data` to be merged
`dat.2`	the second dataset of class `mappoly.data` to be merged (default = NULL); if `dat.2 = NULL`, the function returns `dat.1` only

Value

An object of class mappoly.data which contains all markers from both datasets. It will be a list with the following components:

`ploidy`	ploidy level
`n.ind`	number individuals
`n.mrk`	total number of markers
`ind.names`	the names of the individuals
`mrk.names`	the names of the markers
`dosage.p1`	a vector containing the dosage in parent P for all `n.mrk` markers
`dosage.p2`	a vector containing the dosage in parent Q for all `n.mrk` markers
`chrom`	a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence
`genome.pos`	Physical position of the markers into the sequence
`seq.ref`	if one or both datasets originated from read_vcf, it keeps reference alleles from sequencing platform, otherwise is NULL
`seq.alt`	if one or both datasets originated from read_vcf, it keeps alternative alleles from sequencing platform, otherwise is NULL
`all.mrk.depth`	if one or both datasets originated from read_vcf, it keeps marker read depths from sequencing, otherwise is NULL
`prob.thres`	(unused field)
`geno.dose`	a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by `ploidy_level + 1`
`geno`	if both datasets contain genotype distribution information, the final object will contain 'geno'. This is set to NULL otherwise
`nphen`	(0)
`phen`	(NULL)
`chisq.pval`	a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers in both datasets
`kept`	if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers
`elim.correspondence`	if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers and its equivalence to the redundant ones

Author(s)

Gabriel Gesteira, gdesiqu@ncsu.edu

References

Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378

Examples


## Loading a subset of SNPs from chromosomes 3 and 12 of sweetpotato dataset 
## (SNPs anchored to Ipomoea trifida genome)
dat <- NULL
for(i in c(3, 12)){
  cat("Loading chromosome", i, "...\n")
    tempfl <- tempfile(pattern = paste0("ch", i), fileext = ".vcf.gz")
    x <- "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch"
    address <- paste0(x, i, ".vcf.gz")
    download.file(url = address, destfile = tempfl)
    dattemp <- read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2",
                        ploidy = 6, verbose = FALSE)
    dat <- merge_datasets(dat, dattemp)
  cat("\n")
}
dat
plot(dat)

[Package mappoly version 0.4.1 Index]