HACSim-package {HACSim} | R Documentation |
Iterative Extrapolation of Species' Haplotype Accumulation Curves for Genetic Diversity Assessment
Description
HACSim (Haplotype Accumulation Curve Simulator) employs a novel nonparametric stochastic (Monte Carlo) optimization method of iteratively generating species' haplotype accumulation curves through extrapolation to assess sampling completeness based on the approach outlined in Phillips et al. (2015) <doi:10.1515/dna-2015-0008>, Phillips et al. (2019) <doi:10.1002/ece3.4757> and Phillips et al. (2020) <doi: 10.7717/peerj-cs.243>. HACSim outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species' haplotypes. Any genomic marker can be targeted to assess likely required specimen sample sizes for genetic diversity assessment. The method is particularly well-suited to assess sampling sufficiency for DNA barcoding initiatives. Users can also simulate their own DNA sequences according to various models of nucleotide substitution. A Shiny app is also available.
Details
The DESCRIPTION file:
Package: | HACSim |
Type: | Package |
Title: | Iterative Extrapolation of Species' Haplotype Accumulation Curves for Genetic Diversity Assessment |
Version: | 1.0.6-1 |
Date: | 2022-05-23 |
Author: | Jarrett D. Phillips [aut, cre], Steven H. French [ctb], Navdeep Singh [ctb] |
Maintainer: | Jarrett D. Phillips <phillipsjarrett1@gmail.com> |
Description: | Performs iterative extrapolation of species' haplotype accumulation curves using a nonparametric stochastic (Monte Carlo) optimization method for assessment of specimen sampling completeness based on the approach of Phillips et al. (2015) <doi:10.1515/dna-2015-0008>, Phillips et al. (2019) <doi:10.1002/ece3.4757> and Phillips et al. (2020) <doi: 10.7717/peerj-cs.243>. 'HACSim' outputs a number of useful summary statistics of sampling coverage ("Measures of Sampling Closeness"), including an estimate of the likely required sample size (along with desired level confidence intervals) necessary to recover a given number/proportion of observed unique species' haplotypes. Any genomic marker can be targeted to assess likely required specimen sample sizes for genetic diversity assessment. The method is particularly well-suited to assess sampling sufficiency for DNA barcoding initiatives. Users can also simulate their own DNA sequences according to various models of nucleotide substitution. A Shiny app is also available. |
License: | GPL-3 |
URL: | <https://github.com/jphill01/HACSim.R> <https://github.com/jphill01/HACSim-RShiny-App> <https://jphill01.shinyapps.io/HACSim> |
NeedsCompilation: | yes |
Imports: | ape (>= 5.3), data.table (>= 1.12.8), graphics (>= 3.6.1), matrixStats (>= 0.56.0), pegas (>= 0.13), Rcpp (>= 1.0.3), shiny (>= 1.6.0), stats (>= 3.6.1), utils (>= 3.6.1) |
LinkingTo: | Rcpp, RcppArmadillo |
RoxygenNote: | 6.1.1 |
Packaged: | 2019-10-23 16:00:58 UTC; jarrettphillips |
Index of help topics:
HAC.sim Internal R code HAC.simrep Run a simulation of haplotype accumulation curves for hypothetical or real species HACClass Internal R code HACHypothetical Set up an object to simulate haplotype accumulation curves for a hypothetical species HACReal Set up an object to simulate haplotype accumulation curves for a real species HACSim-package Iterative Extrapolation of Species' Haplotype Accumulation Curves for Genetic Diversity Assessment accumulate Internal C++ code envr Simulation variable storage environment launchApp Launch HACSim R Shiny web app sim.seqs Simulate DNA sequences according to DNA substitution models
Author(s)
Jarrett D. Phillips [aut, cre], Steven H. French [ctb], Navdeep Singh [ctb]
Maintainer: Jarrett D. Phillips <phillipsjarrett1@gmail.com>
References
Phillips, J.D., Gwiazdowski, R.A., Ashlock, D. and Hanner, R. (2015). An exploration of sufficient sampling effort to describe intraspecific DNA barcode haplotype diversity: examples from the ray-finned fishes (Chordata: Actinopterygii). DNA Barcodes, 3: 66-73.
Phillips, J.D., Gillis, D.J. and Hanner, R.H. (2019). Incomplete estimates of genetic diversity within species: Implications for DNA barcoding. Ecology and Evolution, 9(5): 2996-3010.
Phillips, J.D., Gillis, D.J. and Hanner, R.H. (2020). HACSim: An R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves. PeerJ Computer Science
Examples
## Simulate hypothetical species ##
N <- 100 # total number of sampled individuals
Hstar <- 10 # total number of haplotypes
probs <- rep(1/Hstar, Hstar) # equal haplotype frequency distribution
HACSObj <- HACHypothetical(N = N, Hstar = Hstar,
probs = probs, filename = "output") # outputs a CSV
# file called "output.csv"
## Simulate hypothetical species - subsampling ##
HACSObj <- HACHypothetical(N = N, Hstar = Hstar,
probs = probs, perms = 1000, p = 0.95,
subsample = TRUE, prop = 0.25, conf.level = 0.95,
filename = "output")
## Simulate hypothetical species and all paramaters changed - subsampling ##
HACSObj <- HACHypothetical(N = N, Hstar = Hstar, probs = probs,
perms = 10000, p = 0.90, subsample = TRUE, prop = 0.15,
conf.level = 0.95, filename = "output")
HAC.simrep(HACSObj) # runs a simulation
## Simulate real species ##
## Not run:
## Simulate real species ##
# outputs file called "output.csv"
HACSObj <- HACReal(filename = "output")
## Simulate real species - subsampling ##
HACSObj <- HACReal(subsample = TRUE, prop = 0.15,
conf.level = 0.95, filename = "output")
## Simulate real species and all parameters changed - subsampling ##
HACSObj <- HACReal(perms = 10000, p = 0.90, subsample = TRUE,
prop = 0.15, conf.level = 0.99, filename = "output")
# user prompted to select appropriate FASTA file
HAC.simrep(HACSObj)
## End(Not run)
## Not run:
## Simulate DNA sequences ##
num.seqs <- 100 # number of DNA sequences
num.haps <- 15 # number of haplotypes
length.seqs <- 658 # length of DNA sequences
count.haps <- c(60, rep(10, 2), rep(5, 2), rep(1, 5)) # haplotype frequency distribution
nucl.freqs <- rep(0.25, 4) # nucleotide frequency distribution
subst.model <- "JC69" # desired nucleotide substitution model
mu.rate <- 1e-3 # mutation rate
transi.rate <- NULL # transition rate
transv.rate <- NULL # transversion rate
sim.seqs(num.seqs = num.seqs, num.haps = num.haps, length.seqs = length.seqs,
nucl.freqs = nucl.freqs, count.haps = count.haps, subst.model = subst.model,
transi.rate = transi.rate, transv.rate = transv.rate)
# outputs file called "output.csv"
HACSObj <- HACReal(filename = "output")
## Simulate DNA sequences - subsampling ##
HACSObj <- HACReal(subsample = TRUE, prop = 0.15,
conf.level = 0.95, filename = "output")
## Simulate DNA sequences and all parameters changed - subsampling ##
HACSObj <- HACReal(perms = 10000, p = 0.90, subsample = TRUE,
prop = 0.15, conf.level = 0.99, filename = "output")
# user prompted to select appropriate FASTA file
HAC.simrep(HACSObj)
## End(Not run)