dna2vector {astsa} | R Documentation |
Convert DNA Sequence to Indicator Vectors
Description
Takes a string (e.g., a DNA sequence) of general form (e.g., FASTA) and converts it to a sequence of indicator vectors for use with the Spectral Envelope (specenv
).
Usage
dna2vector(data, alphabet = NULL)
Arguments
data |
A single string. |
alphabet |
The particular alphabet being used. The default is |
Details
Takes a string of categories and converts it to a matrix of indicators. The data can then be used by the script specenv
, which calculates the Spectral Envelope of the sequence (or subsequence). Many different type of sequences can be used, including FASTA and GenBank, as long as the data is a string of categories.
The indicator vectors (as a matrix) are returned invisibly in case the user forgets to put the results in an object wherein the screen would scroll displaying the entire sequence. In other words, the user should do something like xdata = dna2vector(data)
where data
is the original sequence.
As an example, if the DNA sequence is in a FASTA file, say sequence.fasta
, remove the first line, which will look like >V01555.2 ...
. Then the following code can be used to read the data into the session, create the indicator sequence and save it as a compressed R data file:
fileName <- 'sequence.fasta' # name of FASTA file data <- readChar(fileName, file.info(fileName)$size) # input the sequence myseq <- dna2vector(data) # convert it to indicators ##== to compress and save the data ==## save(myseq, file='myseq.rda') ##== and then load it when needed ==## load('myseq.rda')
Value
matrix of indicator vectors; returned invisibly
Author(s)
D.S. Stoffer
References
You can find demonstrations of astsa capabilities at FUN WITH ASTSA.
The most recent version of the package can be found at https://github.com/nickpoison/astsa/.
In addition, the News and ChangeLog files are at https://github.com/nickpoison/astsa/blob/master/NEWS.md.
The webpages for the texts and some help on using R for time series analysis can be found at https://nickpoison.github.io/.
See Also
Examples
# Epstein-Barr virus (entire sequence included in astsa)
xdata = dna2vector(EBV)
head(xdata)
# part of EBV with 1, 2, 3, 4 for "A", "C", "G", "T"
xdata = dna2vector(bnrf1ebv)
head(xdata)
# raw GenBank sequence
data <-
c("1 agaattcgtc ttgctctatt cacccttact tttcttcttg cccgttctct ttcttagtat
61 gaatccagta tgcctgcctg taattgttgc gccctacctc ttttggctgg cggctattgc")
xdata = dna2vector(data, alphabet=c('a', 'c', 'g', 't'))
head(xdata)
# raw FASTA sequence
data <-
c("AGAATTCGTCTTGCTCTATTCACCCTTACTTTTCTTCTTGCCCGTTCTCTTTCTTAGTATGAATCCAGTA
TGCCTGCCTGTAATTGTTGCGCCCTACCTCTTTTGGCTGGCGGCTATTGCCGCCTCGTGTTTCACGGCCT")
xdata = dna2vector(data)
head(xdata)