ghap.anctest {GHap} | R Documentation |
Prediction of haplotype ancestry
Description
This function uses prototype alleles to predict ancestry of haplotypes in test samples.
Usage
ghap.anctest(object, blocks = NULL,
prototypes, test = NULL,
only.active.samples = TRUE,
only.active.markers = TRUE,
ncores = 1, verbose = TRUE)
Arguments
object |
A GHap.phase object. |
blocks |
A data frame containing block boundaries, such as supplied by the |
prototypes |
A data frame containing prototype alleles, such as supplied by the |
test |
Character vector of individuals to test. All active individuals are used if this vector is not provided. |
only.active.samples |
A logical value specifying whether only active samples should be included in predictions (default = TRUE). |
only.active.markers |
A logical value specifying whether only active markers should be used for predictions (default = TRUE). |
ncores |
A numeric value specifying the number of cores to be used in parallel computing (default = 1). |
verbose |
A logical value specfying whether log messages should be printed (default = TRUE). |
Details
For each interrogated block, tested haplotypes are assigned to their nearest centroids (i.e., the pseudo-lineages with the smallest Euclidean distances). If no blocks are supplied, the function automatically builds blocks compatible with admixture up to 10 generations in the past based on intermarker distances. This has been chosen according to simulation results, where the use of haplotype blocks compatible with recent admixture (~10 generations) retained reasonable accuracy across most scenarios, regardless of the age of admixture.
Value
The function returns a dataframe with the following columns:
BLOCK |
Block alias. |
CHR |
Chromosome name. |
BP1 |
Block start position. |
BP2 |
Block end position. |
POP |
Original population label. |
ID |
Individual name. |
HAP1 |
Predicted ancestry of haplotype 1. |
HAP2 |
Predicted ancestry of haplotype 2. |
Author(s)
Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>
References
Y.T. Utsunomiya et al. Unsupervised detection of ancestry tracks with the GHap R package. Methods in Ecology and Evolution. 2020. 11:1448–54.
See Also
ghap.anctrain
, ghap.ancsmooth
, ghap.ancplot
, ghap.ancmark
Examples
# #### DO NOT RUN IF NOT NECESSARY ###
#
# # Copy phase data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
# format = "phase",
# verbose = TRUE)
# file.copy(from = exfiles, to = "./")
#
# # Load phase data
#
# phase <- ghap.loadphase("example")
#
# ### RUN ###
#
# # Calculate marker density
# mrkdist <- diff(phase$bp)
# mrkdist <- mrkdist[which(mrkdist > 0)]
# density <- mean(mrkdist)
#
# # Generate blocks for admixture events up to g = 10 generations in the past
# # Assuming mean block size in Morgans of 1/(2*g)
# # Approximating 1 Morgan ~ 100 Mbp
# g <- 10
# window <- (100e+6)/(2*g)
# window <- ceiling(window/density)
# step <- ceiling(window/4)
# blocks <- ghap.blockgen(phase, windowsize = window,
# slide = step, unit = "marker")
#
# # BestK analysis
# bestK <- ghap.anctrain(object = phase, K = 5, tune = TRUE)
# plot(bestK$ssq, type = "b", xlab = "K",
# ylab = "Within-cluster sum of squares")
#
# # Unsupervised analysis with best K
# prototypes <- ghap.anctrain(object = phase, K = 2)
# hapadmix <- ghap.anctest(object = phase,
# blocks = blocks,
# prototypes = prototypes,
# test = unique(phase$id))
# anctracks <- ghap.ancsmooth(object = phase, admix = hapadmix)
# ghap.ancplot(ancsmooth = anctracks)
#
# # Supervised analysis
# train <- unique(phase$id[which(phase$pop != "Cross")])
# prototypes <- ghap.anctrain(object = phase, train = train,
# method = "supervised")
# hapadmix <- ghap.anctest(object = phase,
# blocks = blocks,
# prototypes = prototypes,
# test = unique(phase$id))
# anctracks <- ghap.ancsmooth(object = phase, admix = hapadmix)
# ghap.ancplot(ancsmooth = anctracks)