get_LFDRs {SNVLFDR}R Documentation

Estimates LFDR values per genomic site

Description

Based on a given read count matrix, identifies single nucleotide variants (SNVs) by estimating local false discovery rates (LFDRs). Users can set an initial value for the proportion of non-mutant sites and specify thresholds for allele frequency, read depth and LFDR cut-off value.

Usage

get_LFDRs(
  bam_input,
  bedfile,
  BQ.T,
  MQ.T,
  pi0.initial,
  AF.T,
  DP.T,
  LFDR.T,
  error,
  method,
  epsilon
)

Arguments

bam_input

Path to an input BAM file. The file must be in the format of a csv file generated by bam-readcount (https://github.com/genome/bam-readcount). See Examples.

bedfile

Path to a bed file containing genomic regions of interest. This file has to have three columns with chr# in the first column and start and end positions in the second and third columns respectively. No headers.

BQ.T

Minimum base call quality threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with a base call quality below the specified threshold. It is recomended to set it to 20.

MQ.T

Minimum mapping quality threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with a mapping quality below the specified threshold. It is recomended to set it to 20.

pi0.initial

Initial value for the proportion of non-mutant sites. It can be any number between 0 and 1. However it is recommended to set it to a number between 0.95 and 0.99 for more accuracy. If no value is specified, it will be set to 0.95 by default.

AF.T

Allele frequency threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with an allele frequency below the specified threshold. If no value is specified, it will be set to 0.01 by default.

DP.T

Read depth threshold. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with a read depth below the specified threshold.

LFDR.T

A number between 0 and 1. It can be set to 0 to include all sites. Otherwise, this threshold excludes sites with an estimated LFDR below the specified threshold. If no value is specified, it will be set to 0.01 by default.

error

Error rate between 0 and 1. If it is set to NULL, a weighted average of average base call quality and average mapping quality per site will be calculated. Otherwise, it may be set to 0.01 or a desired error vector can be introduced by the user.

method

Method used to estimate pi0 and LFDRs. It can be "empirical", "uniform_empirical" or "uniform". If no method is specified, it will be set to "empirical" by default (recommended).

epsilon

The difference between old and new estimates of pi0 used for convergence. If no value is specified, it will be set to 0.01 by default.

Value

A list. Slot estimated.pi0 returns estimated proportion of non-mutant sites. Slot estimated.LFDRs returns estimated LFDRs for genomic sites that were not filtered out. Slot filtered.bam adds estimated LFDRs, model errors and a mutant variable (indicating whether each site is detected to be a mutant (1) or non-mutant (0) site) to the filtered input file .

References

Karimnezhad, A. and Perkins, T.J. (2024). Empirical Bayes single nucleotide variant calling for next-generation sequencing data. Scientific Reports 14, 1550, <doi:10.1038/s41598-024-51958-z>

Examples

bam_input <- system.file("extdata", "bam_input.csv", package="SNVLFDR")
bedfile <- system.file("extdata", "regions.bed", package="SNVLFDR")
BQ.T=20
MQ.T=20
pi0.initial=0.95
AF.T=0.01
DP.T=10
LFDR.T=0.01
error=NULL
method='empirical'
epsilon=0.01
output=get_LFDRs(bam_input,bedfile,BQ.T,MQ.T,pi0.initial,AF.T,DP.T,LFDR.T,error,method,epsilon)


[Package SNVLFDR version 1.0.1 Index]