Density.Feature {EncDNA} | R Documentation |
Nucleotide sequence encoding with the distribution of trinucleotides.
Description
Each nucleotide sequence is encoded into a numeric vector of same length based on the distribution of nucleotides over the sequence. Here, two classes of dataset are not required for encoding, and each sequence is independently encoded instead. This encoding seheme was introduced by Wei et al. (2013) for prediction of donor and acceptor human splice sites along with the MM1.Feature
.
Usage
Density.Feature(test_seq)
Arguments
test_seq |
Sequence dataset to be encoded, must be an object of class |
Details
The class DNAStringSet can be obtained by reading FASTA sequences using the function readDNAStringSet avialble in Biostrings package of Bioconductor.
Value
A numeric matrix of order m*n
, where m
is the number of sequences in test_seq
and n
is the length of sequence.
Author(s)
Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA
References
Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.
Examples
data(droso)
test <- droso$test
tst <- test[1:5]
enc <- Density.Feature(test_seq=tst)
enc