R: Pairwise Similarity Matrix

kmeRs_similarity_matrix {kmeRs}

R Documentation

Pairwise Similarity Matrix

Description

The kmeRs_similarity_matrix function generates a pairwise similarity score matrix for for k length given k-mers vs. all possible k-mers combination. The pairwise similarity score is calculated using PAM or BLOSUM substitution matrix; 30, 40, 70, 120, 250 and 62, 45, 50, 62, 80, 100 matrix versions are available for PAM or BLOSUM, respectively. The results are evaluated by global similarity score; higher similarity score indicates more similar sequences for BLOSUM and opposite for PAM matrix.

Usage

kmeRs_similarity_matrix(
  q = NULL,
  x = NULL,
  align.type = "global",
  k = 3,
  seq.type = "AA",
  submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes =
    "BLOSUM62", no = NA),
  compare.all = FALSE,
  save_to_file = NULL,
  ...
)

Arguments

`q`	query vector with given k-mers
`x`	kmers to search the query vector against. If unspecified, `q` will be compared to either other k-mers within `q` (`compare.all = FALSE`), or all possible combinations specified by the parameter `k`
`align.type`	type of alignment, either `global` or `local`. `global` uses Needleman-Wunsch global alignment to calculate scores, while `local` represents Smith-Waterman local alignment instead
`k`	length of k-mers to calculate the similarity matrix for, defaults to 3; e.g. for DNA, N = 4^3 = 64 combinations if `k = 3`;
`seq.type`	type of sequence in question, either 'DNA' or 'AA' (default); this will also modify `q` accordingly, if `q` is unspecified.
`submat`	substitution matrix, default to 'BLOSUM62'; other choices are 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100', 'PAM30', 'PAM40', 'PAM70', 'PAM120' or 'PAM250'
`compare.all`	if `TRUE`, the query vector will be compared to all possible combinations of k-mers (defaults to `FALSE`)
`save_to_file`	if specified, the results will be saved to the path in comma-separated format (.CSV)
`...`	other parameters, e.g. gap opening/extension penalties (`gapOpening`, `gapExtension`), or DNA match/mismatch scores (`na.match`, `na.mismatch`)

Value

similarity matrix is returned as a data.frame

Examples

# Simple BLOSUM62 similarity matrix for all amino acid nucleotides
kmeRs_similarity_matrix(submat = "BLOSUM62")

[Package kmeRs version 2.1.0 Index]