read_SS {LncFinder} | R Documentation |
Read Secondary Structure Information
Description
This function can read secondary structure information from your
own file instead of obtaining from function run_RNAfold
. This function
will be useful if users have had secondary structure sequences (Dot-Bracket Notation).
Usage
read_SS(
oneFile.loc,
seqRNA.loc,
seqSS.loc,
separateFile = TRUE,
withMFE = TRUE
)
Arguments
oneFile.loc |
String. The location of your sequence file. This file should contains
one (and only one) RNA sequence and its secondary structure sequence in Dot-Bracket Notation.
This parameter needs to be defined only when |
seqRNA.loc |
String. The location of your RNA sequences file (FASTA format). If your
RNA sequences and secondary structure sequences are in two files, you need to define the
locations of two files respectively. And the files with multiple sequences are supported
for this option. This parameter needs to be defined only when |
seqSS.loc |
String. The location of your secondary structure sequences file (FASTA format). |
separateFile |
Logical. Your RNA sequence(s) and secondary structure sequence(s) are in
separate files? If |
withMFE |
Logical. Whether MFE is provided at the end of secondary structure sequence.
If |
Details
When users want to predict sequences with secondary structure features, users may have
had their own secondary structure sequences. With this function, users can read SS information
from their files. Two kind of files are supported: RNA sequence and SS sequence in one file
separateFile
is FALSE
or in separate files separateFile = TRUE
.
separateFile = FALSE
is used for secondary structure that obtained from some popular
programs, such as RNAfold. In this case, the output file only contains one RNA sequence and
its SS. Besides, this file only have two rows: RNA sequence and its SS sequences. Thus, this
option is more favorable when the file only have one sequence and the sequence are in accordance
with the output format of RNAfold.
If users obtained the SS sequence from experiments, RNA sequence and SS sequence may be in two
files. In this case, users can select separateFile = TRUE
. Two files should be in FASTA
format and one file can have multiple sequences. The sequences in two files should have the same
order. If your data are obtained from experiments or other sources, it is highly recommended
that users should build new model with this data, since the SS sequences of pre-built model are
obtained for RNAfold and may have many differences with experimental data.
Value
A dataframe. The first row is RNA sequence, the second row is Dot-Bracket Notation of secondary structure sequence, the third row is MFE (if MFE is provided).
Author(s)
HAN Siyu
See Also
Examples
## Not run:
### Load sequence data
data("demo_DNA.seq")
Seqs <- demo_DNA.seq[1:4]
### Convert sequences from vector to string.
Seqs <- sapply(Seqs, seqinr::getSequence, as.string = TRUE)
### Write a fasta file.
seqinr::write.fasta(Seqs, names = names(Seqs), file.out = "tmp.RNA.fa", as.string = TRUE)
### For Windows system: (Your path of RNAfold.)
RNAfold.path <- '"E:/Program Files/ViennaRNA/RNAfold.exe"'
### Define the parameters of RNAfold. See documents of RNAfold for more information.
RNAfold.command <- paste(RNAfold.path, " --noPS -i tmp.RNA.fa -o output", sep = "")
### Run RNAfold and output four result files.
system(RNAfold.command)
### Read secondary structure information for one file.
result_1 <- read_SS(oneFile.loc = "output_ENST00000510062.1.fold",
separateFile = FALSE, withMFE = TRUE)
### Read secondary sturcture sequences for multiple files.
filePath <- dir(pattern = ".fold")
result_2 <- sapply(filePath, read_SS, separateFile = FALSE, withMFE = TRUE)
result_2 <- as.data.frame(result_2)
## End(Not run)