PSTNPds {ftrCOOL} | R Documentation |
Position-Specific Trinucleotide Propensity based on double-strand (PSTNPds)
Description
This function works like PSTNPss_DNA except that it considers T as A and G as C. So it converts Ts in the sequence to A and Gs to C. Then, it works with 2 alphabets A and C. For more details refer to PSTNPss_DNA.
Usage
PSTNPds(seqs, pos, neg, label = c())
Arguments
seqs |
is a FASTA file containing nucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence. |
pos |
is a fasta file containing nucleotide sequences. Each sequence starts with '>'. Also, the value of this parameter can be a string vector. The sequences are positive sequences in the training model. |
neg |
is a fasta file containing nucleotide sequences. Each sequence starts with '>'. Also, the value of this parameter can be a string vector. The sequences are negative sequences in the training model. |
label |
is an optional parameter. It is a vector whose length is equal to the number of sequences. It shows the class of each entry (i.e., sequence). |
Value
It returns a feature matrix. The number of columns is equal to the length of sequences minus two and the number of rows is equal to the number of sequences.
Note
The length of the sequences in positive and negative data sets and the input sets should be equal.
References
Chen, Zhen, et al. "iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data." Briefings in bioinformatics 21.3 (2020): 1047-1057.
Examples
ptmSeqsADR<-system.file("extdata/",package="ftrCOOL")
posSeqs<-fa.read(file=paste0(ptmSeqsADR,"/posData.txt"),alphabet="dna")
negSeqs<-fa.read(file=paste0(ptmSeqsADR,"/negData.txt"),alphabet="dna")
seqs<-fa.read(file=paste0(ptmSeqsADR,"/testData.txt"),alphabet="dna")
PSTNPds(seqs=seqs,pos=posSeqs[1],neg=negSeqs[1])