get_LFDRs {SNVLFDR} | R Documentation |
Estimates LFDR values per genomic site
Description
Based on a given read count matrix, identifies single nucleotide variants (SNVs) by estimating local false discovery rates (LFDRs). Users can set an initial value for the proportion of non-mutant sites and specify thresholds for allele frequency, read depth and LFDR cut-off value.
Usage
get_LFDRs(
bam_input,
bedfile,
BQ.T,
MQ.T,
pi0.initial,
AF.T,
DP.T,
LFDR.T,
error,
method,
epsilon
)
Arguments
bam_input |
Path to an input BAM file. The file must be in the format of a csv file generated by bam-readcount (https://github.com/genome/bam-readcount). See Examples. |
bedfile |
Path to a bed file containing genomic regions of interest. This file has to have three columns with chr# in the first column and start and end positions in the second and third columns respectively. No headers. |
BQ.T |
Minimum base call quality threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with a base call quality below the specified threshold. It is recomended to set it to 20. |
MQ.T |
Minimum mapping quality threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with a mapping quality below the specified threshold. It is recomended to set it to 20. |
pi0.initial |
Initial value for the proportion of non-mutant sites. It can be any number between 0 and 1. However it is recommended to set it to a number between 0.95 and 0.99 for more accuracy. If no value is specified, it will be set to 0.95 by default. |
AF.T |
Allele frequency threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with an allele frequency below the specified threshold. If no value is specified, it will be set to 0.01 by default. |
DP.T |
Read depth threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with a read depth below the specified threshold. |
LFDR.T |
A number between 0 and 1. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with an estimated LFDR below the specified threshold. If no value is specified, it will be set to 0.01 by default. |
error |
Error rate between 0 and 1. If it is set to NULL, a weighted average of average base call quality and average mapping quality per site will be calculated. Otherwise, it may be set to 0.01 or a desired error vector can be introduced by the user. |
method |
Method used to estimate pi0 and LFDRs. It can be "empirical", "uniform_empirical" or "uniform". If no method is specified, it will be set to "empirical" by default (recommended). |
epsilon |
The difference between old and new estimates of pi0 used for convergence. If no value is specified, it will be set to 0.01 by default. |
Value
A list. Slot estimated.pi0 returns estimated proportion of non-mutant sites. Slot estimated.LFDRs returns estimated LFDRs for genomic sites that were not filtered out. Slot filtered.bam adds estimated LFDRs, model errors and a mutant variable (indicating whether each site is detected to be a mutant (1) or non-mutant (0) site) to the filtered input file .
References
Karimnezhad, A. and Perkins, T.J. (2024). Empirical Bayes single nucleotide variant calling for next-generation sequencing data. Scientific Reports 14, 1550, <doi:10.1038/s41598-024-51958-z>
Examples
bam_input <- system.file("extdata", "bam_input.csv", package="SNVLFDR")
bedfile <- system.file("extdata", "regions.bed", package="SNVLFDR")
BQ.T=20
MQ.T=20
pi0.initial=0.95
AF.T=0.01
DP.T=10
LFDR.T=0.01
error=NULL
method='empirical'
epsilon=0.01
output=get_LFDRs(bam_input,bedfile,BQ.T,MQ.T,pi0.initial,AF.T,DP.T,LFDR.T,error,method,epsilon)