remove_noise_from_bams {noisyr}R Documentation

Function to remove the noisy reads from the BAM files

Description

This function is used to remove the noisy reads from the BAM files. It uses as input the BAM file names, a gene table (usually containing individual exons, made using cast_gtf_to_genes), an expression matrix for each of these genes and a vector of abundance thresholds.

Usage

remove_noise_from_bams(
  bams,
  genes,
  expression,
  noise.thresholds,
  destination.files = base::paste0(base::basename(bams), ".noisefiltered.bam"),
  filter.by = c("gene", "exon"),
  make.index = FALSE,
  unique.only = TRUE,
  mapq.unique = 255,
  ...
)

Arguments

bams

a character vector of the BAM file names

genes

a tibble of the exons extracted from the gtf file; (usually the the output of cast_gtf_to_genes)

expression

the expression matrix or expression summary list, as calculated by calculate_expression_similarity_transcript

noise.thresholds

a vector of expression thresholds by sample; must be the same length as the number of BAM files, or a singular value to be used as a fixed noise threshold

destination.files

names for the output denoised BAM files; by default the same as the original files, appended with ".noisefiltered.bam", but created in the working directory

filter.by

Either "gene" (default) or "exon"; if filter.by="gene", a gene is removed from all BAM files if and only if all of its exons are below the corresponding noise thresholds; if filter.by="exon", then each exon is individually removed (from all samples) if it is below the corresponding noise thresholds.

make.index

whether a BAM index should be generated; if this is FALSE (the default) and no index exists, the function will exit with an error; the index needs to have the same name as each BAM file, but ending with .bam.bai

unique.only

whether only uniquely mapped reads should contribute to the expression of a gene/exon; default is TRUE

mapq.unique

The values of the mapping quality field in the BAM file that corresponds to uniquely mapped reads; by default, values of 255 are used as these correspond to the most popular aligners, but an adjustment might be needed; the mapq scores should be as follows: 255 for STAR, 60 for hisat2, 255 for bowtie in -k mode, 40 for bowtie2 default, 50 for tophat

...

arguments passed on to other methods

Value

Returns a matrix of the same dims as the expression matrix, with the noise removed. This matrix has no entries remaining below the noise threshold.

See Also

remove_noise_from_matrix

Examples

bams <- rep(system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE), 2)
genes <- data.frame("id" = 1:2,
                    "gene_id" = c("gene1", "gene2"),
                    "seqid" = c("seq1", "seq2"),
                    "start" = 1,
                    "end" = 1600)
noise.thresholds <- c(0, 1)
expression.summary = calculate_expression_similarity_transcript(
  bams = bams,
  genes = genes,
  mapq.unique = 99
)
remove_noise_from_bams(
    bams = bams,
    genes = genes,
    expression = expression.summary,
    noise.thresholds = noise.thresholds,
    destination.files = paste0(tempdir(), "/", basename(bams), ".noisefiltered.bam"),
    mapq.unique = 99
)


[Package noisyr version 1.0.0 Index]