mat.dis {bios2mds} | R Documentation |
Matrices of dissimilarity scores between amino acid sequences
Description
Computes a matrix providing the distances based on dissimilarity scores between sequences from two multiple sequence alignments.
Usage
mat.dis(align1, align2, sub.mat.id = "PAM250", sqrt=FALSE)
Arguments
align1 |
a list of character vectors representing a first multiple sequence aligment. |
align2 |
a list of character vectors representing a second multiple sequence aligment. |
sub.mat.id |
a string of characters indicating the amino acid substitution matrix used for calculation
of the dissimilarity score. This should be one of "PAM40", "PAM80", "PAM120", "PAM160", "PAM250", "BLOSUM30", "BLOSUM45", "BLOSUM62", "BLOSUM80", "GONNET", "JTT", "JTT_TM" and "PHAT".
The supported substitution matrices are in |
sqrt |
a logical value indicating whether the distance should be equal to the squared root of the difference score (TRUE) or not (FALSE). Default is FALSE. |
Details
The dissimilarity score between a sequence i from align1
and a sequence j from align2
is calculated with an amino acid substitution matrix from sub.mat
.
If align1
and align2
are identical, mat.dis
computes the symetrical matrix of distances between each sequence of the alignment.
Before using mat.dis
, users must check the alignment of sequences within align1
and align2
and between align1
and align2
.
Value
A named numeric matrix providing the dissimilarity-based distances between each pair of sequences from align1
and align2
, based on the substitution matrix sub.mat.id
. The number of rows and columns is identical to the number of sequences in align1
and align2
, respectively.
Author(s)
Julien Pele and Jean-Michel Becu
Examples
# calculating dissimilarity distances between GPCR sequences sample from
#H. sapiens and D. melanogaster, based on the PAM250 matrix:
aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds"))
mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5])
mat.dis1
# calculating dissimilarity distances between GPCRs sequences sample from
#H. sapiens and D. melanogaster, based on the BLOSUM45 matrix:
aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds"))
mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5], sub.mat.id = "BLOSUM45")
mat.dis1