kmer.frac.curve {preseqR} | R Documentation |
Fraction of k
-mers observed at least r
times
Description
kmer.frac.curve
predicts the expected fraction of k
-mers observed at
least r
times in a high-throughput sequencing experiment given the
amount of sequencing
Usage
kmer.frac.curve(n, k, read.len, seq, r=2, mt=20)
Arguments
n |
A two-column matrix.
The first column is the frequency |
k |
The number of nucleotides in a |
read.len |
The average length of a read. |
seq |
The amount of nucleotides sequenced.. |
r |
A positive integer. Default is 1. |
mt |
An positive integer constraining possible rational function approximations. Default is 20. |
Details
kmer.frac.curve
is mainly designed for metagenomics to evaluate how
saturated a metagenomic data is.
kmer.frac.curve
is the fast version of kmer.frac.curve.bootstrap
.
The function does not provide the confidence interval. To obtain the
confidence interval along with the estimates, one should use the function
kmer.frac.curve.bootstrap
.
Value
A two-column matrix. The first column is the amount of sequencing in an
experiment. The second column is the estimate of the fraction of k
-mers observed at least
r
times in the experiment.
Author(s)
Chao Deng
References
Deng, C and Smith, AD (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804
Examples
## load library
library(preseqR)
## import data
data(SRR061157_k31)
## the fraction of 31-mers represented at least 10 times in an experiment when
## sequencing 1M, 10M, 100M, 1G, 10G, 100G, 1T nucleotides
kmer.frac.curve(n=SRR061157_k31, k=31, read.len=100, seq=10^(6:12), r=10, mt=20)