PSTNPss_DNA {ftrCOOL} | R Documentation |
Position-Specific Trinucleotide Propensity based on single-strand DNA (PSTNPss_DNA)
Description
The inputs to this function are positive and negative data sets and a set of sequences. The output of the function is a matrix of feature vectors. The number of rows of the output matrix is equal to the number of sequences. The feature vector for an input sequence with length L is [u(1),u(2),...u(L-2)]. For each input sequence, u(1) is calculated by subtracting the frequency of sequences (which start with the same trinucleotides as the input sequence) in the positive set with those starting with the same trinucleotide in the negative set. We compute u(i) like u(1) with the exception that instead of the first trinucleotide, the ith trinucletide is considered.
Usage
PSTNPss_DNA(seqs, pos, neg, label = c())
Arguments
seqs |
is a FASTA file containing nucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence. |
pos |
is a fasta file containing nucleotide sequences. Each sequence starts with '>'. Also, the value of this parameter can be a string vector. The sequences are positive sequences in the training model. |
neg |
is a fasta file containing nucleotide sequences. Each sequence starts with '>'. Also, the value of this parameter can be a string vector. |
label |
is an optional parameter. It is a vector whose length is equal to the number of sequences. It shows the class of each entry (i.e., sequence). |
Value
It returns a feature matrix. The number of columns is equal to the length of sequences minus two and the number of rows is equal to the number of sequences.
Note
The length of the sequences in positive and negative data sets and the input sets should be equal.
Examples
ptmSeqsADR<-system.file("extdata/",package="ftrCOOL")
posSeqs<-fa.read(file=paste0(ptmSeqsADR,"/posDNA.txt"),alphabet="dna")
negSeqs<-fa.read(file=paste0(ptmSeqsADR,"/negDNA.txt"),alphabet="dna")
seqs<-fa.read(file=paste0(ptmSeqsADR,"/DNA_testing.txt"),alphabet="dna")
mat=PSTNPss_DNA(seqs=seqs,pos=posSeqs,neg=negSeqs)