KNNPeptide {ftrCOOL} | R Documentation |
K-Nearest Neighbor for Peptides (KNNPeptide)
Description
This function needs an extra training data set and a label. We compute the similarity score of each input sequence with all sequences in the training data set. We use the BLOSUM62 matrix to compute the similarity score. The label shows the class of each sequence in the training data set. KNNPeptide finds the label of 1 It reports the frequency of each class for each k
Usage
KNNPeptide(seqs, trainSeq, percent = 30, label = c(), labeltr = c())
Arguments
seqs |
is a fasta file with amino acids sequences. Each sequence starts with a '>' character or it is a string vector such that each element is a peptide or protein sequence. |
trainSeq |
is a fasta file with amino acids sequences. Each sequence starts with a '>' character. Also it could be a string vector such that each element is a peptide sequence. Eaxh sequence in the training set is associated with a label. The label is found in the parameret labeltr. |
percent |
determines the threshold which is used to identify sequences (in the training set) which are similar to the input sequence. |
label |
is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence). |
labeltr |
This parameter is a vector whose length is equivalent to the number of sequences in the training set. It shows class of each sequence in the trainig set. |
Value
This function returns a feature matrix such that number of columns is number of classes multiplied by percent and number of rows is equal to the number of the sequences.
Note
This function is usable for amino acid sequences with the same length in both training data set and the set of sequences.
References
Chen, Zhen, et al. "iFeature: a python package and web server for features extraction and selection from protein and peptide sequences." Bioinformatics 34.14 (2018): 2499-2502.
Examples
ptmSeqsADR<-system.file("extdata/",package="ftrCOOL")
ptmSeqsVect<-as.vector(read.csv(paste0(ptmSeqsADR,"/ptmVect101AA.csv"))[,2])
posSeqs<-as.vector(read.csv(paste0(ptmSeqsADR,"/poSeqPTM101.csv"))[,2])
negSeqs<-as.vector(read.csv(paste0(ptmSeqsADR,"/negSeqPTM101.csv"))[,2])
posSeqs<-posSeqs[1:10]
negSeqs<-negSeqs[1:10]
trainSeq<-c(posSeqs,negSeqs)
labelPos<-rep(1,length(posSeqs))
labelNeg<-rep(0,length(negSeqs))
labeltr<-c(labelPos,labelNeg)
KNNPeptide(seqs=ptmSeqsVect,trainSeq=trainSeq,percent=10,labeltr=labeltr)