simMatrix {RFLPtools} | R Documentation |
Similarity matrix for BLAST data.
Description
Function to compute similarity matrix for all-vs-all BLAST results of rDNA sequences generated with standalone BLAST from NCBI or local BLAST implemented in BioEdit.
Usage
simMatrix(x, sequence.range = FALSE, Min, Max)
Arguments
x |
data.frame with BLAST data; see |
sequence.range |
logical: use sequence range. |
Min |
minimum sequence length. |
Max |
maximum sequence length. |
Details
The given BLAST data is used to compute a similarity matrix using the following algorithm: First, the length of each sequence (LS) comprised in the input data file is extracted. If there is more than one comparison for one sequence including different parts of the respective sequence, that one with maximum base length is chosen. Subsequently, the number of matching bases (mB) is calculated by multiplying two variables comprised in the BLAST output: the identity between sequences (%) and the number of nucleotides divided by 100. The, resulting value is rounded to integer. Furthermore, the similarity is calculated by dividing mB by LS. Finally, the similarity matrix including all sequences is built. If the similarity of a combination is not shown in the BLAST report file (because the similarity was lower than 70%), this comparison is included in the similarity matrix with the result zero.
Value
Similarity matrix.
Author(s)
Fabienne Flessa Fabienne.Flessa@uni-bayreuth.de,
Alexandra Kehl Alexandra.Kehl@uni-tuebingen.de,
Matthias Kohl Matthias.Kohl@stamats.de
References
Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews
BioEdit: https://bioedit.software.informer.com/
Persoh, D., Melcher, M., Flessa, F., Rambold, G.: First fungal community analyses of endophytic ascomycetes associated with Viscum album ssp. austriacum and itshost Pinus sylvestris. Fungal Biology 2010 Jul;114(7):585-96.
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
See Also
Examples
data(BLASTdata)
## without sequence range
## code takes some time
## Not run:
res <- simMatrix(BLASTdata)
## End(Not run)
## with sequence range
range(BLASTdata$alignment.length)
res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450)
res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)