R: Nucleotide sequence encoding with the distribution of...

Density.Feature {EncDNA}

R Documentation

Nucleotide sequence encoding with the distribution of trinucleotides.

Description

Each nucleotide sequence is encoded into a numeric vector of same length based on the distribution of nucleotides over the sequence. Here, two classes of dataset are not required for encoding, and each sequence is independently encoded instead. This encoding seheme was introduced by Wei et al. (2013) for prediction of donor and acceptor human splice sites along with the MM1.Feature.

Usage

Density.Feature(test_seq)

Arguments

test_seq

Sequence dataset to be encoded, must be an object of class DNAStringSet.

Details

The class DNAStringSet can be obtained by reading FASTA sequences using the function readDNAStringSet avialble in Biostrings package of Bioconductor.

Value

A numeric matrix of order m*n, where m is the number of sequences in test_seq and n is the length of sequence.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

References

Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.

Examples

data(droso)
test <- droso$test
tst <- test[1:5]
enc <- Density.Feature(test_seq=tst)
enc

[Package EncDNA version 1.0.2 Index]