make_referFreq {LncFinder} | R Documentation |
Make Frequencies File for Log.Dist, Euc.Dist, and hexamer score
Description
This function is used to calculate the frequencies of lncRNAs and CDs.
The Frequencies file can be used to calculate Logarithm-Distance (compute_LogDistance
),
Euclidean-Distance (compute_EucDistance
), and hexamer score (compute_hexamerScore
).
NOTE: If users need to make frequencies file to build
new LncFinder classifier using function extract_features
,
please refer to function make_frequencies
.
Usage
make_referFreq(
cds.seq,
lncRNA.seq,
k = 6,
step = 1,
alphabet = c("a", "c", "g", "t"),
on.orf = TRUE,
ignore.illegal = TRUE
)
Arguments
cds.seq |
Coding sequences (mRNA without UTRs). Can be a FASTA file loaded
by |
lncRNA.seq |
Long non-coding RNA sequences. Can be a FASTA file loaded by
|
k |
An integer that indicates the sliding window size. (Default: |
step |
Integer defaulting to |
alphabet |
A vector of single characters that specify the different character
of the sequence. (Default: |
on.orf |
Logical. Incomplete CDs can lead to a false shift and a
inaccurate hexamer frequencies. When |
ignore.illegal |
Logical. If |
Details
This function is used to make frequencies file for the computation of
Logarithm-Distance (compute_LogDistance
), Euclidean-Distance
(compute_EucDistance
),
and hexamer score (compute_hexamerScore
).
In order to achieve high accuracy, mRNA should not be regarded as CDs and assigned
to parameter cds.seq
. However, CDs of some species may be insufficient
for calculating frequencies. In that case, mRNAs can be regarded as CDs with parameter
on.orf = TRUE
, and the frequencies will be calculated on ORF region.
If on.orf = TRUE
, users can set step = 3
to simulate the translation process.
Value
Returns a list which consists the frequencies of protein-coding sequences and non-coding sequences.
References
Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.
Author(s)
HAN Siyu
See Also
make_frequencies
,
compute_LogDistance
,
compute_EucDistance
,
compute_hexamerScore
.
Examples
## Not run:
Seqs <- seqinr::read.fasta(file =
"http://www.ncbi.nlm.nih.gov/WebSub/html/help/sample_files/nucleotide-sample.txt")
referFreq <- make_referFreq(cds.seq = Seqs, lncRNA.seq = Seqs, k = 6, step = 1,
alphabet = c("a", "c", "g", "t"), on.orf = TRUE,
ignore.illegal = TRUE)
## End(Not run)