R: Find seed matches in genomic features

SeedMatchR {SeedMatchR}

R Documentation

Find seed matches in genomic features

Description

Find seed matches in a DNAStringSet object of sequences. This function will use get.seed extract the seed sequence from the guide sequence. The seed is then searched across all rows of the DNAStringSet object using vpatterncount.

This function returns the input DESeq2 results data.frame with an additional column that contains the counts for the input seed.name.

Usage

SeedMatchR(
  res,
  gtf,
  seqs,
  sequence,
  seed.name = "mer7m8",
  col.name = NULL,
  mismatches = 0,
  indels = FALSE,
  tx.id.col = TRUE
)

Arguments

`res`	A DESeq2 results `data.frame`
`gtf`	GTF file used to map features to genes. The object must have columns transcript_id and gene_id
`seqs`	The `DNAStringSet` object with sequence information for features. The names of the sequences should be the transcript names.
`sequence`	The `DNAString` guide sequence oriented 5' > 3'.
`seed.name`	The name of specific seed to extract. Options are: mer8, mer7A1, mer7m8, mer6
`col.name`	The string to use for the column name. Defaults to seed name
`mismatches`	The number of mismatches to allow in search
`indels`	Whether to allow indels in search
`tx.id.col`	Use the transcript_id column instead of gene_id

Value

A modified DESeq2 results dataframe that has column named after the seed of choice representing the number of match counts.

Examples


library(dplyr)

seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
res <- Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, seq, "mer7m8")

[Package SeedMatchR version 1.1.1 Index]