ghap.simadmix {GHap}R Documentation

Simulate individuals from specified admixture proportions

Description

Generation of simulated haplotypes based on a list of user-defined ancestral populations.

Usage

  ghap.simadmix(object, n.individuals,
                n.generations, ancestors,
                proportions = NULL,
                alpha = NULL,
                out.file,
                only.active.markers = TRUE,
                ncores = 1,verbose = TRUE)

Arguments

object

A GHap.phase object.

n.individuals

Number of individuals to simulate.

n.generations

Number of generations past the admixture event.

ancestors

List of ancestral populations. Each ancestral population is given as a vector of ids of the ancestors belonging to that population.

proportions

A dataframe containing the ancestry proportions in the simulated individuals. The number of columns has to be equal to the number of ancestral populations defined in the ancestors list, and the number of row has to be equal to the number of simulated individuals. See argument 'alpha' if you want the function to generate random samples for admixture proportions.

alpha

A list with same size of the 'ancestors' list, with each value representing the vector of parameters to be used for sampling ancestry proportions from a Direchelet distribution.

out.file

Output file name.

only.active.markers

A logical value specifying whether only active markers should be used in simulations (default = TRUE).

ncores

A numeric value specifying the number of cores to be used in parallel computing (default = 1).

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

Details

Given a list of ancestral populations, this function simulates haplotypes of individuals descending from admixture among these populations after a defined number of generations. The ancestry proportions can be sampled from a Direchelet distribution with a vector of parameters alpha. For example, if three ancestral populations are considered and the parameters are set as alpha = list(pop1 = 1, pop2 = 1, pop3 = 1), the resulting simulated individuals will have a highly diverse configuration of ancestry proportions. On the other hand, by setting alpha = list(pop1 = 0, pop2 = 1, pop3 = 0) for example, all individuals will descend entirely from population 2 without admixture. The user can play with these values to fine tune the desired sampling distribution. Alternatively, exact proportions for each simulated individual can be set through the 'proportions' argument by providing a dataframe containing the desired ancestry values.

Value

The function outputs the following files:

The user can then treat these files as a regular GHap input, or use them to build the Oxford HAPS/SAMPLES format for analysis with other software. The simulated data can be useful in the evaluation of accuracy of different algorithms designed for admixture analysis, as well as of other methods in the field of population genomics.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

Examples


# # Copy the example data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "phase",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Load phase data
# phase <- ghap.loadphase("example")
# 
# # Make vectors of ancestors
# pure1 <- unique(phase$id[which(phase$pop == "Pure1")])
# pure2 <- unique(phase$id[which(phase$pop == "Pure2")])
# 
# # Simulate proportions
# ngroup <- 30
# pop1 <- c(rep(1, times = ngroup), # purebred from population 1
#           runif(n = ngroup, min = 0.1, max = 0.9), # admixed individuals
#           rep(0, times = ngroup)) # purebred from population 2
# pop2 <- 1-pop1
# prop <- data.frame(pop1,pop2)
# 
# # Simulate individuals
# set.seed(1988)
# ghap.simadmix(object = phase, n.individuals = nrow(prop),
#               n.generations = 10,
#               ancestors = list(pop1 = pure1, pop2 = pure2),
#               proportions = prop, out.file = "sim")
# ghap.compress(input.file = "sim", out.file = "sim")
# sim <- ghap.loadphase("sim")
# 
# # Unsupervised analysis with K = 2
# prototypes <- ghap.anctrain(object = sim, K = 2)
# hapadmix <- ghap.anctest(object = sim,
#                          prototypes = prototypes,
#                          test = unique(sim$id))
# anctracks <- ghap.ancsmooth(object = sim, admix = hapadmix)
# 
# # Load simulated ancestry proportions
# ancsim <- NULL
# ancsim$proportions2 <- read.table(file = "sim.proportions", header=T)
# ancsim$haplotypes <- read.table(file = "sim.haplotypes", header=T)
# 
# # Compare estimates with real values
# # Obs: the original populations had introgression from each other
# # Admixture seen in the estimates of purebreds reflect that introgression
# ghap.ancplot(ancsmooth = ancsim)
# ghap.ancplot(ancsmooth = anctracks)
# cor(anctracks$proportions2$K1, ancsim$proportions2$pop1)
# ghap.karyoplot(ancsmooth = ancsim, ids = sim$id[66])
# ghap.karyoplot(ancsmooth = anctracks, ids = sim$id[66])


[Package GHap version 3.0.0 Index]