kmer.frac.curve.bootstrap {preseqR}R Documentation

Fraction of k-mers observed at least r times with bootstrap

Description

kmer.frac.curve predicts the expected fraction of k-mers observed at least r times in a high-throughput sequencing experiment given the amount of sequencing

Usage

kmer.frac.curve.bootstrap(n, k, read.len, seq, r=2, mt=20, times=30, conf=0.95)

Arguments

n

A two-column matrix. The first column is the frequency j = 1,2,\dots; and the second column is N_j, the number of k-mers observed exactly j times in the initial experiment. The first column must be sorted in an ascending order.

k

The number of nucleotides in a k-mer.

read.len

The average length of a read.

seq

The amount of nucleotides sequenced.

r

A positive integer. Default is 1.

mt

An positive integer constraining possible rational function approximations. Default is 20.

times

The number of bootstrap samples.

conf

The confidence level. Default is 0.95

Details

This is the bootstrap version of kmer.frac.curve. The bootstrap sample is generated by randomly sampling the initial sample with replacement. For each bootstrap sample, we construct an estimator. The median of estimates is used as the prediction for the number of species represented at least r times in a random sample.

The confidence interval is constructed based on a lognormal distribution.

Value

A four-column matrix. The first column is the amount of sequencing in an experiment. The second column is the estimate of the fraction of k-mers observed at least r times in the experiment. The third and fourth columns are the lower bounds and the upper bounds of the confidence intervals.

Author(s)

Chao Deng

References

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.

Deng, C., Daley, T., Calabrese, P., Ren, J., & Smith, A.D. (2016). Estimating the number of species to attain sufficient representation in a random sample. arXiv preprint arXiv:1607.02804v3.

Examples

## load library
# library(preseqR)

## import data
# data(SRR061157_k31)

## the fraction of 31-mers represented at least 10 times in an experiment when
## sequencing 1M, 10M, 100M, 1G, 10G, 100G, 1T nucleotides
# kmer.frac.curve.bootstrap(n=SRR061157_k31, k=31, read.len=100, 
#                          seq=10^(6:12), r=10, mt=20) 

[Package preseqR version 4.0.0 Index]