preseqR.rSAC.sequencing.rmdup {preseqR} | R Documentation |
Predicting r
-SAC in WES/WGS
Description
preseqR.rSAC.sequencing.rmdup
predicts the expected number of
nucleotides in the genome sequenced at least r times in a sequencing
experiment, based on a shallow sequencing experiment.
Usage
preseqR.rSAC.sequencing.rmdup(n_base, n_read, r=1, mt=20, times=30, conf=0.95)
Arguments
n_base |
A two-column matrix.
The first column is the frequency |
n_read |
A two-column matrix.
The first column is the frequency |
r |
A positive integer. Default is 1. |
mt |
An positive integer constraining possible rational function approximations. Default is 20. |
times |
The number of bootstrap samples. Default is 30. |
conf |
The confidence level. Default is 0.95 |
Details
preseqR.rSAC.sequencing.rmdup
is designed for sequencing experiments,
where duplicate reads are removed. The procedure is commonly used in
whole-exome sequencing experiments and sometimes appeared in WGS as well.
To use the function, one must have two histograms. The first histogram
is the coverage histogram, which is based on distinct reads.
The second histogram is the counts of reads with exactly j
duplicates.
Value
f |
The estimator for the expected number of nucleotides in the genome
sequenced at least |
se |
The standard error for the estimator. The input is a vector of sequencing efforts t. |
lb |
The lower bound of the confidence interval.The input is a vector of sequencing efforts t. |
ub |
The upper bound of the confidence interval.The input is a vector of sequencing efforts t. |
Author(s)
Chao Deng
References
Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.
Examples
## load library
#library(preseqR)
## import data
# data(SRR1301329_1M_base)
# data(SRR1301329_1M_read)
# construct the estimator
# estimator1 <- preseqR.rSAC.sequencing.rmdup(
# n_base=SRR1301329_1M_base, n_read=SRR5365359_5M_read,
# r=4, mt=20, times=100, conf=0.95)
## The number of nucleotides in the genome covered at least 4 times, when the
## amount of sequencing is 10 or 20 times of the intial experiment
## 10 or 20 times of the initial sample
# estimator1$f(c(10, 20))
## The standard error of the estiamtes
# estimator1$se(c(10, 20))
## The confidence interval of the estimates
# lb <- estimator1$lb(c(10, 20))
# ub <- estimator1$ub(c(10, 20))
# matrix(c(lb, ub), byrow=FALSE, ncol=2)
# construct the estimator
# estimator2 <- preseqR.rSAC.sequencing.rmdup(
# n_base=SRR1301329_1M_base, n_read=SRR5365359_5M_read,
# r=10, mt=20, times=100, conf=0.95)
## The number of nucleotides in the genome covered at least 10 times, when the
## amount of sequencing is 10 or 20 times of the intial experiment
## 10 or 20 times of the initial sample
# estimator2$f(c(10, 20))
## The standard error of the estiamtes
# estimator2$se(c(10, 20))
## The confidence interval of the estimates
# lb <- estimator2$lb(c(10, 20))
# ub <- estimator2$ub(c(10, 20))
# matrix(c(lb, ub), byrow=FALSE, ncol=2)