R: Tri-nucleotide distribution-based encoding of nucleotide...

Trint.Dist.Feature {EncDNA}

R Documentation

Tri-nucleotide distribution-based encoding of nucleotide sequences.

Description

This encoding scheme was first time adopted by Wei et al. (2013) for prediction of splice sites along with MM1 features. In this encoding technique, distribution of trinucleotides are taken into consideration independently for the exon and intron regions of splice site motifs.

Usage

Trint.Dist.Feature(test_seq)

Arguments

test_seq

Sequence dataset to be transformed into numeric feature vectors. There should be atleat two sequences, must be an object of class DNAStringSet.

Details

This encoding scheme is independent of positive and negative datasets. In other words, each sequence can be encoded independently. Further, nucleotide sequence of any length will be transformed into a numeric vector of 64 observations corresponding to 64 combinations of trinucleotides.

Value

A numeric matrix of order m*64, where m is the number of sequences in test_seq.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

Wei, D., Zhang, H., Wei, Y. and Jiang, Q. (2013). A novel splice site prediction method using support vector machine. J Comput Inform Syst., 920: 8053-8060.

Examples

data(droso)
test <- droso$test
tst <- test
enc <- Trint.Dist.Feature(test_seq=tst)
enc

[Package EncDNA version 1.0.2 Index]