vqscustompct {longreadvqs}R Documentation

Sequencing error minimization with customized % cut-off at particular nucleotide region, read down-sampling, and data preparation for viral quasispecies comparison

Description

Minimizes potential long-read sequencing error based on the specified cut-off percentages of low frequency nucleotide base and down-samples read for further comparison with other samples. In this function, the cut-off percentage can be specifically adjusted for different ranges of nucleotide positions which is very useful when sequencing error heavily occurs in a particular part of reads. The output of this function is a list of several objects representing diversity of each sample that must be used as an input for other functions such as "snvcompare" or "vqscompare".

Arguments

fasta

Input as a read alignment in FASTA format

method

Sequencing error minimization methods that replace low frequency nucleotide base (less than the "pct" cut-off) with consensus base of that position ("conbase": default) or with base of the dominant haplotype ("domhapbase").

samplingfirst

Downsampling before (TRUE) or after (FALSE: default) the error minimization.

pct

Percent cut-off defining low frequency nucleotide base that will be replaced (must be specified).

brkpos

Ranges of nucleotide positions with different % cut-off specified in "lspct" for example c("1:50","51:1112") meaning that the first and the second ranges are nucleotide positions 1 to 50 and 51 to 1112, respectively.

lspct

List of customized % cut-off applied to nucleotide ranges set in "brkpos" for example c(15,8) meaning that 15% and 8% cut-offs will be applied to the first and the second ranges, respectively.

gappct

The percent cut-off particularly specified for gap (-). If it is not specified or less than "pct", "gappct" will be equal to "pct" (default).

ignoregappositions

Replace all nucleotides in the positions in the alignment containing gap(s) with gap. This will make such positions no longer single nucleotide variant (SNV). The default is "FALSE".

samsize

Sample size (number of reads) after down-sampling. If it is not specified or more than number of reads in the original alignment, down-sampling will not be performed (default).

label

String within quotation marks indicating name of read alignment (optional). Please don't use underscore (_) in the label.

Value

list of 1) "dat": viral quasispecies diversity metrics calculated by QSutils package (similar to "vqssub" function's output), 2) "snvhap": SNV profile of each haplotype with frequency and new label for "vqscompare" function, 3) "snv": plot of SNV frequency for "snvcompare" function, 4) "hapre": DNAStringSet of read alignment of each haplotype for "vqscompare" function, 5) "lab": name of sample or read alignment

Examples

## Locate input FASTA file------------------------------------------------------------------------
fastafilepath <- system.file("extdata", "badend.fasta", package = "longreadvqs")

## Prepare data for viral quasispecies comparison using 10% cut-off across all positions----------
nocustom <- vqsassess(fastafilepath, pct = 10, label = "nocustom")

## Prepare data using 10% cut-off for the first 74 positions and 30% cut-off for the rest---------
custom <- vqscustompct(fastafilepath, pct = 10,
                       brkpos = c("1:74","75:84"), lspct = c(10,30), label = "custom")

## Use "snvcompare" function to check whether SNV profile looks better or not---------------------
snvcompare(samplelist = list(nocustom, custom), ncol = 1)


[Package longreadvqs version 0.1.2 Index]