KNNPeptide {ftrCOOL}R Documentation

K-Nearest Neighbor for Peptides (KNNPeptide)

Description

This function needs an extra training data set and a label. We compute the similarity score of each input sequence with all sequences in the training data set. We use the BLOSUM62 matrix to compute the similarity score. The label shows the class of each sequence in the training data set. KNNPeptide finds the label of 1 It reports the frequency of each class for each k

Usage

KNNPeptide(seqs, trainSeq, percent = 30, label = c(), labeltr = c())

Arguments

seqs

is a fasta file with amino acids sequences. Each sequence starts with a '>' character or it is a string vector such that each element is a peptide or protein sequence.

trainSeq

is a fasta file with amino acids sequences. Each sequence starts with a '>' character. Also it could be a string vector such that each element is a peptide sequence. Eaxh sequence in the training set is associated with a label. The label is found in the parameret labeltr.

percent

determines the threshold which is used to identify sequences (in the training set) which are similar to the input sequence.

label

is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence).

labeltr

This parameter is a vector whose length is equivalent to the number of sequences in the training set. It shows class of each sequence in the trainig set.

Value

This function returns a feature matrix such that number of columns is number of classes multiplied by percent and number of rows is equal to the number of the sequences.

Note

This function is usable for amino acid sequences with the same length in both training data set and the set of sequences.

References

Chen, Zhen, et al. "iFeature: a python package and web server for features extraction and selection from protein and peptide sequences." Bioinformatics 34.14 (2018): 2499-2502.

Examples



ptmSeqsADR<-system.file("extdata/",package="ftrCOOL")
ptmSeqsVect<-as.vector(read.csv(paste0(ptmSeqsADR,"/ptmVect101AA.csv"))[,2])

posSeqs<-as.vector(read.csv(paste0(ptmSeqsADR,"/poSeqPTM101.csv"))[,2])
negSeqs<-as.vector(read.csv(paste0(ptmSeqsADR,"/negSeqPTM101.csv"))[,2])

posSeqs<-posSeqs[1:10]
negSeqs<-negSeqs[1:10]

trainSeq<-c(posSeqs,negSeqs)

labelPos<-rep(1,length(posSeqs))
labelNeg<-rep(0,length(negSeqs))

labeltr<-c(labelPos,labelNeg)

KNNPeptide(seqs=ptmSeqsVect,trainSeq=trainSeq,percent=10,labeltr=labeltr)


[Package ftrCOOL version 2.0.0 Index]