gbs2ploidy-package {gbs2ploidy} | R Documentation |
Inference of Ploidy from (Genotyping-by-Sequencing) GBS Data
Description
Functions for inference of ploidy from (Genotyping-by-sequencing) GBS data, including a function to infer allelic ratios and allelic proportions in a Bayesian framework.
Details
The DESCRIPTION file:
Package: | gbs2ploidy |
Type: | Package |
Title: | Inference of Ploidy from (Genotyping-by-Sequencing) GBS Data |
Version: | 1.0 |
Date: | 2016-12-01 |
Author: | Zachariah Gompert |
Maintainer: | Zachariah Gompert <zach.gompert@usu.edu> |
Depends: | R (>= 2.10), MASS, rjags |
Description: | Functions for inference of ploidy from (Genotyping-by-sequencing) GBS data, including a function to infer allelic ratios and allelic proportions in a Bayesian framework. |
License: | GPL-3 |
Index of help topics:
dat Simulated allele counts estploidy Discriminate cytotypes using GBS data estprops Estimate allelic proportions gbs2ploidy-package Inference of Ploidy from (Genotyping-by-Sequencing) GBS Data
A typical analysis will begin by estimating allelic proportions using the estprops
function. This is done in a Bayesian framework and is the most computationally intensive part of the analysis (i.e., depending on the size of the data set, this might take a day or more). This function depends on rjags
, which means the user needs to install the stand-alone program JAGS
as well. Principal component analysis and discriminant analysis are then used to obtain cytotype assignment probabilities via the estploidy
function. This can be done with or without a training set of individuals with known ploidies.
Author(s)
Zachariah Gompert
Maintainer: Zachariah Gompert <zach.gompert@usu.edu>
References
Gompert Z. and Mock K. (XXXX) Detection of individual ploidy levels with genotyping-by-sequencing (GBS) analysis. Molecular Ecology Resources, submitted.
Examples
## load a simulated data set
data(dat)
## Not run:
## obtain posterior estimates of allelic proportions; short chains are used for
## the example, we recommend increasing this to at least 1000 MCMC steps with a
## 500 step burnin
props<-estprops(cov1=t(dat[[1]]),cov2=t(dat[[2]]),mcmc.steps=20,mcmc.burnin=5,
mcmc.thin=2)
## calculate observed heterozygosity and depth of coverage from the allele count
## data
hx<-apply(is.na(dat[[1]]+dat[[2]])==FALSE,1,mean)
dx<-apply(dat[[1]]+dat[[2]],1,mean,na.rm=TRUE)
## run estploidy without using known ploidy data
pl<-estploidy(alphas=props,het=hx,depth=dx,train=FALSE,pl=NA,set=NA,nclasses=2,
ids=dat[[3]],pcs=1:2)
## boxplots to visualize posterior assignment probabilities by true ploidy
## (which is known because these are simulated data)
boxplot(pl$pp[,1] ~ dat[[3]],ylab="assignment probability",xlab="ploidy")
## run estploidy with a training data set with known ploidy; the data set is
## split into 100 individuals with known ploidy and 100 that are used for
## inference
truep<-dat[[3]]
trn<-sort(sample(1:200,100,replace=FALSE))
truep[-trn]<-NA
plt<-estploidy(alphas=props,het=hx,depth=dx,train=TRUE,pl=truep,set=trn,
nclasses=2,ids=dat[[3]],pcs=1:2)
## boxplots to visualize posterior assignment probabilities for individuals that
## were not part of the training set by true ploidy (which is known because
## these are simulated data)
boxplot(plt$pp[,1] ~ dat[[3]][-trn],ylab="assignment probability",xlab="ploidy")
## End(Not run)