summarize_blast_result {disprose}R Documentation

Summarize BLAST result

Description

Summarize aligned, not aligned and undesirably aligned sequences

Usage

summarize_blast_result(
  sum.aligned = "sp",
  blast.probe.id.var,
  blast.res.id.var,
  blast.res.title.var,
  reference.id.var,
  reference.title.var,
  titles = FALSE,
  add.blast.info = FALSE,
  data.blast.info,
  check.blast.for.source = FALSE,
  source = NULL,
  switch.ids = FALSE,
  switch.table,
  mc.cores = 1,
  digits = 2,
  sep = ";",
  temp.db = NULL,
  delete.temp.db = TRUE,
  return = "summary",
  write.alignment = "DB",
  alignment.db = NULL,
  alignment.table.sp.aligned = NULL,
  alignment.table.sp.not.aligned = NULL,
  alignment.table.nonsp = NULL,
  change.colnames.dots = TRUE,
  file.sp.aligned = NULL,
  file.sp.not.aligned = NULL,
  file.nonsp = NULL,
  verbose = TRUE
)

Arguments

sum.aligned

character; summarize specific or not specific alignments; possible values are "sp" (aligned and not aligned specific subjects) and "nonsp" (aligned non specific subjects)

blast.probe.id.var

vector of query identification numbers from BLAST result data

blast.res.id.var, blast.res.title.var

vector of subject identification numbers and titles from BLAST result data

reference.id.var, reference.title.var

vector of identification numbers and titles of specific sequences that should be or might be aligned

titles

logical; include titles in alignment reports

add.blast.info

logical; add other BLAST results

data.blast.info

data frame; additional BLAST result from BLAST result data

check.blast.for.source

logical; delete queries that are not aligned with one obligatory sequence

source

identification number of obligatory sequence for alignment

switch.ids

logical; use different identification numbers for BLAST result's subjects

switch.table

data frame; table of old and new identification numbers (and new titles) linked by row

mc.cores

integer; number of processors for parallel computation (not supported on Windows)

digits

integer; number of decimal places to round the result

sep

character; the field separator character

temp.db

character; temporal SQLite database name and path

delete.temp.db

logical; delete temporal SQLite database afterwards

return

character; returned object; possible values are "list" (list of data frames with alignment summary and report for each probe) and "summary" (data frame with summary for all probes is returned and alignment reports are written into files or SQLite database tables)

write.alignment

character; write alignment reports into files ("file") or SQLite database tables ("DB"; used if (return = "summary"))

alignment.db, alignment.table.sp.aligned, alignment.table.sp.not.aligned, alignment.table.nonsp

character; SQLite database name and path, tables names (used if write.alignment = "DB")

change.colnames.dots

logical; change dots to underscore in data frame column names (used if write.alignment = "DB")

file.sp.aligned, file.sp.not.aligned, file.nonsp

character; file names and path (used if write.alignment = "file")

verbose

logical; show messages

Details

This function works with data frame created by blast_local function. It takes BLAST results, divides aligned subjects on specific (that should be aligned) and non specific (that should not be aligned) according to reference) values. Function summarizes amount of aligned and not aligned specific subjects and amount of aligned non specific subjects.

When sum.aligned = "sp" aligned and not aligned specific subjects are summarized and reference.id.var and reference.title.var should contain sequences that it is necessary to align with. When sum.aligned = "nonsp" aligned non specific subjects are summarized and reference.id.var should contain sequences that may be aligned (that are not considered as non specific), no titles needed.

When return = "summary", function returns summary (amount of aligned and not aligned subjects) and writes sorted alignments (alignment report) in file (write.alignment = "file") or SQLite database (write.alignment = "DB"). Usually only subjects' ids and (optionally) titles are returned, but you may add as many BLAST results as you like with add.blast.info and data.blast.info parameters. If you add some BLAST results, all alignments will present in alignment report, if not - duplicated subjects will be deleted.

By default result tables in database (if write.alignment = "DB") are "sp_aligned", "sp_not_aligned" and "nonsp", Results are written by appending, so if files or tables already exist, data will be added into them.

If subjects identification numbers in BLAST result data differ from those in reference.id.var you may use switch.ids = TRUE to change BLAST ids into new according to switch.table. switch.table must be a data frame with column one - old ids, column two - new ids and (optionally) column three - new titles. Do not use dots in column names.

When check.blast.for.source = TRUE probes that are non blasted for one special subject (usually the sequence that was cut for probes) are deleted. No check.blast.for.source is performed if sum.aligned = "nonsp". Check for source is performed after the possible id.switch, so source should be identification number of same type as reference.

Probe identification number must be character variable.

If alignment report is written into database, probe identification variable is indexed in all tables. Also it is highly recommended to set change.colnames.dots = TRUE to change possible dots to underscore within result data frame's column names and avoid further mistakes.

While working function saves data in temporal SQLite database. Function will stop if same database already exists, so deleting temporal database is highly recommended.

Value

List of data frames with alignment summary and report for each probe or data frame with summary for all probes (alignment reports are written into files or SQLite database tables).

Author(s)

Elena N. Filatova

Examples

path <- tempdir()
dir.create (path)
# load blast results with subject accession numbers
data(blast.fill)
#load metadata of all Chlamydia pneumoniae sequences - they are subjects that
# do not count as nonspecific and may be aligned
data(meta.all)
# load metadata with target Chlamydia pneumoniae sequences - they are specific subjects
# that must be aligned
# make new accession numbers to count all WGS sequences as one (see unite_NCBI_ac.nums ())
meta.target.new.ids <- unite_NCBI_ac.nums (data = meta.target,
                                          ac.num.var = meta.target$GB_AcNum,
                                          title.var = meta.target$title,
                                          db.var = meta.target$source_db,
                                          type = "shotgun", order = TRUE,
                                          new.titles = TRUE)
# summarize blast results, count aligned specific subjects with "switch ids" option
# (WGS sequences are counted as one). Add query cover information.
blast.sum.sp <- summarize_blast_result (sum.aligned = "sp",
                                       blast.probe.id.var = blast.fill$Qid,
                                       blast.res.id.var = blast.fill$Racc,
                                       blast.res.title.var = blast.fill$Rtitle,
                                       reference.id.var = meta.target.new.ids$new.id,
                                       reference.title.var = meta.target.new.ids$new.title,
                                       titles = TRUE,
                                       add.blast.info = TRUE,
                                       data.blast.info = data.frame(Qcover = blast.fill$Qcover),
                                       switch.ids = TRUE, switch.table = meta.target.new.ids,
                                       temp.db = paste0 (path, "/temp.db"), delete.temp.db = TRUE,
                                       return = "summary", write.alignment = "DB",
                                       alignment.db = paste0 (path, "/alig.db"))
# summarize nonspecific alignments (that are not in meta.all dataframe)
blast.sum.nonsp <- summarize_blast_result (sum.aligned = "nonsp",
                                          blast.probe.id.var = blast.fill$Qid,
                                          blast.res.id.var = blast.fill$Racc,
                                          blast.res.title.var = blast.fill$Rtitle,
                                          reference.id.var = meta.all$GB_AcNum,
                                          reference.title.var = meta.all$title,
                                          titles = TRUE, switch.ids = FALSE,
                                          add.blast.info = TRUE,
                                          data.blast.info = data.frame(Qcover = blast.fill$Qcover),
                                          temp.db = paste0 (path, "/temp.db"),
                                          delete.temp.db = TRUE,
                                          return = "summary", write.alignment = "DB",
                                          alignment.db = paste0 (path, "/alig.db"))
# all specific targets are aligned
sp.aligned <- read_from_DB(database = paste0 (path, "/alig.db"), table = "sp_aligned")
# no targets that are not aligned
sp.not.aligned <- read_from_DB(database = paste0 (path, "/alig.db"), table = "sp_not_aligned")
# No nonspecific alignments
nonsp <- read_from_DB(database = paste0 (path, "/alig.db"), table = "nonsp")
file.remove (paste0 (path, "/alig.db"))


[Package disprose version 0.1.6 Index]