kmeRs_similarity_matrix {kmeRs}R Documentation

Pairwise Similarity Matrix

Description

The kmeRs_similarity_matrix function generates a pairwise similarity score matrix for for k length given k-mers vs. all possible k-mers combination. The pairwise similarity score is calculated using PAM or BLOSUM substitution matrix; 30, 40, 70, 120, 250 and 62, 45, 50, 62, 80, 100 matrix versions are available for PAM or BLOSUM, respectively. The results are evaluated by global similarity score; higher similarity score indicates more similar sequences for BLOSUM and opposite for PAM matrix.

Usage

kmeRs_similarity_matrix(
  q = NULL,
  x = NULL,
  align.type = "global",
  k = 3,
  seq.type = "AA",
  submat = ifelse(test = (match.arg(toupper(seq.type), c("DNA", "AA")) == "AA"), yes =
    "BLOSUM62", no = NA),
  compare.all = FALSE,
  save_to_file = NULL,
  ...
)

Arguments

q

query vector with given k-mers

x

kmers to search the query vector against. If unspecified, q will be compared to either other k-mers within q (compare.all = FALSE), or all possible combinations specified by the parameter k

align.type

type of alignment, either global or local. global uses Needleman-Wunsch global alignment to calculate scores, while local represents Smith-Waterman local alignment instead

k

length of k-mers to calculate the similarity matrix for, defaults to 3; e.g. for DNA, N = 4^3 = 64 combinations if k = 3;

seq.type

type of sequence in question, either 'DNA' or 'AA' (default); this will also modify q accordingly, if q is unspecified.

submat

substitution matrix, default to 'BLOSUM62'; other choices are 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100', 'PAM30', 'PAM40', 'PAM70', 'PAM120' or 'PAM250'

compare.all

if TRUE, the query vector will be compared to all possible combinations of k-mers (defaults to FALSE)

save_to_file

if specified, the results will be saved to the path in comma-separated format (.CSV)

...

other parameters, e.g. gap opening/extension penalties (gapOpening, gapExtension), or DNA match/mismatch scores (na.match, na.mismatch)

Value

similarity matrix is returned as a data.frame

Examples

# Simple BLOSUM62 similarity matrix for all amino acid nucleotides
kmeRs_similarity_matrix(submat = "BLOSUM62")


[Package kmeRs version 2.1.0 Index]