Predoss.Feature {EncDNA}R Documentation

Encoding nucleotide sequences using all possible di-nucleotide dependencies.

Description

In this encoding, not only the adjecent dependencies are considered, but also the association that exists among non-adjacent nucleotides. In MM1, PN.FDTF features, only the dependencies between adjacent nucleotides are taken into account. Though all possible pair-wise dependencies are first introduced by Meher et al. (2014) for predicting splice sites through probablistic approach, the same authors further used this association to encode the splice site dataset for prediction using machine learning classifiers (Meher et al., 2016).

Usage

Predoss.Feature(positive_class, negative_class, test_seq)

Arguments

positive_class

Sequence dataset of the positive class, must be an object of class DNAStringSet.

negative_class

Sequence dataset of the negative class, must be an object of class DNAStringSet.

test_seq

Sequences to be encoded into numeric vectors, must be of an object of class DNAStringSet.

Details

This encoding approach will be helpful for transformation of nucleotide sequences into numeric feature vectors, which can subsequently be used as input in several supervised learning models for classification.

Value

A numeric matrix of order m*n^{2}, where m is the number of sequences in test_seq and n is the length of sequence.

Note

Dimension of the feature space will increase geometrically with increase in the length of the sequence.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

  1. Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2014). A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics, 15(1), 362.

  2. Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.

Examples

data(droso)
positive <- droso$positive
negative <- droso$negative
test <- droso$test
pos <- positive[1:200]
neg <- negative[1:200]
tst <- test
enc <- Predoss.Feature(positive_class=pos, negative_class=neg, test_seq=tst)
enc

[Package EncDNA version 1.0.2 Index]