WAM.Feature {EncDNA}R Documentation

Nucleic acid sequence encoding based on weighted array model.

Description

Unlike weighted matrix method (WMM), first order nucleotide dependencies are accounted in weighted array model (WAM). The WAM was introduced by Zhang and Marr (1993) for locating splicing signal on nuclotide sequences. The WAM was employed by Meher et al. (2016) for encoding of splice site motifs.

Usage

WAM.Feature(positive_class, negative_class, test_seq)

Arguments

positive_class

Sequence dataset of the positive class, must be an object of class DNAStringSet.

negative_class

Sequence dataset of the negative class, must be an object of class DNAStringSet.

test_seq

Sequences to be encoded into numeric vectors, must be an object of class DNAStringSet.

Details

In this encoding approach, a vector of two observations will be obtained for each sequence, corresponds to the situation when only positive class and both positive & neagtive datasets are used for encoding. This encoding scheme is also invariant to the length of the sequence.

Value

A numeric matrix of order m*2, where m is the number of sequences in test_seq.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

  1. Zhang, M. and Marr, T. (1993). A weight array method for splicing signal analysis. Comput Appl Biosci., 9(5): 499-509.

  2. Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms for Molecular Biology, 11(1): 16.

See Also

MM1.Feature, PN.Fdtf.Feature

Examples

data(droso)
positive <- droso$positive
negative <- droso$negative
test <- droso$test
pos <- positive[1:200]
neg <- negative[1:200]
tst <- test
enc <- WAM.Feature(positive_class=pos, negative_class=neg, test_seq=tst)
enc

[Package EncDNA version 1.0.2 Index]