NUCKpartComposition_DNA {ftrCOOL} | R Documentation |
Nucleotide to K Part Composition (NUCKpartComposition_DNA)
Description
In this function, each sequence is divided into k equal partitions. The length of each part is equal to ceiling(l(lenght of the sequence)/k). The last part can have a different length containing the residual nucleotides. The nucleotide composition is calculated for each part.
Usage
NUCKpartComposition_DNA(
seqs,
k = 5,
ORF = FALSE,
reverseORF = TRUE,
normalized = TRUE,
label = c()
)
Arguments
seqs |
is a FASTA file containing nucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence. |
k |
is an integer value. Each sequence should be divided to k partition(s). |
ORF |
(Open Reading Frame) is a logical parameter. If it is set to true, ORF region of each sequence is considered instead of the original sequence (i.e., 3-frame). |
reverseORF |
is a logical parameter. It is enabled only if ORF is true. If reverseORF is true, ORF region will be searched in the sequence and also in the reverse complement of the sequence (i.e., 6-frame). |
normalized |
is a logical parameter. When it is FALSE, the return value of the function does not change. Otherwise, the return value is normalized using the length of the sequence. |
label |
is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of each entry (i.e., sequence). |
Value
a feature matrix with k*4 number of columns. The number of rows is equal to the number of sequences.
Note
Warning: The length of all sequences should be greater than k.
Examples
fileLNC<-system.file("extdata/Athaliana_LNCRNA.fa",package="ftrCOOL")
mat<-NUCKpartComposition_DNA(seqs=fileLNC,k=5,ORF=TRUE,reverseORF=FALSE,normalized=FALSE)