MM1.Feature {EncDNA} | R Documentation |
Transforming nucleotide sequences into numeric vectors using first order nucleotide dependency.
Description
The concept of sequence encoding using Markov model (1^{st}
order) was introduced by Ho and Rajapakse (2005) for prediction of splice sites. However, this encoding scheme has been comprehensively used by Baten et al. (2006) for prediction of splice sites. In this encoding procedure, first order dependencies between nucleotides in nucleotide sequence are accounted. Only the postive class dataset is used for estimation of dependencies in terms of probabilities, which are then used for encoding.
Usage
MM1.Feature(positive_class, test_seq)
Arguments
positive_class |
Sequence dataset of the positive class, must be an object of class |
test_seq |
Sequences to be encoded into numeric vectors, must be an object of class |
Details
The FASTA sequences should be read into R using the function readDNAStringSet available in Biostrings package. This encoding is similar to PN.FDTF feature, as far as the dependency among nucleotides in a sequence is concerned. The only difference is the use of positive class only in stead of both positive and negative classes in PN.FDTF. This encoding approach has similarity with WAM features (Meher et al. 2016) in which the dinucleotide dependencies are considered.
Value
A numeric matrix of order m*(n-1)
, where m
is the number of sequences in test_seq
and n
is the length of sequence.
Author(s)
Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA
References
-
Rajapakse, J. and Ho, L.S. (2005). Markov encoding for detecting signals in genomic sequences. IEEE/ACM Trans Comput Biol Bioinf., 2(2): 131-142.
-
Baten, A., Chang, B., Halgamuge, S. and Li, J. (2006) Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics, 7(Suppl 5): S15.
Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms for Molecular Biology, 11(1), 16.
See Also
Examples
data(droso)
positive <- droso$positive
test <- droso$test
pos <- positive[1:200]
tst <- test
enc <- MM1.Feature(positive_class=pos, test_seq=tst)
enc