R: Encoding of nucleotide sequences based on sum of absolute...

SAE.Feature {EncDNA}

R Documentation

Encoding of nucleotide sequences based on sum of absolute error (SAE) of each sequence.

Description

The sum of absolute error (SAE) concept was introduced by Meher et al. (2014) for prediction of donor splice sites, and was subsequently used by the same authors (Meher et al., 2016) for encoding of splice site motif for prediction using supervised learning model. In this encoding technique also all possible pair-wise nucleotide dependencies are considered.

Usage

SAE.Feature(positive_class, negative_class, test_seq)

Arguments

`positive_class`	Sequence dataset of the positive class, must be an object of class `DNAStringSet`.
`negative_class`	Sequence dataset of the negative class, must be an object of class `DNAStringSet`.
`test_seq`	Sequences to be encoded into numeric vectors, must be an object of class `DNAStringSet`.

Details

In this encoding approach a vector of two observations will be obtained for each sequence. This two values correspond to the values obtained, when only positive class and both positive & neagtive datasets are used for encoding. This encoding scheme is invariant to the length of the sequence. Thus, both positive and negative classes datasets are required for encoding of sequence.

Value

A numeric matrix of order m*2, where m is the number of sequences in test_seq.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2014). A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics, 15(1), 362.
Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms for Molecular Biology, 11(1), 16.

Examples

data(droso)
positive <- droso$positive
negative <- droso$negative
test <- droso$test
pos <- positive[1:200]
neg <- negative[1:200]
tst <- test
enc <- SAE.Feature(positive_class=pos, negative_class=neg, test_seq=tst)
enc

[Package EncDNA version 1.0.2 Index]