lnc_finder {LncFinder}R Documentation

Long Non-coding RNA Identification

Description

This function is used to predict sequences are non-coding transcripts or protein-coding transcripts.

Usage

lnc_finder(
  Sequences,
  SS.features = FALSE,
  format = "DNA",
  frequencies.file = "human",
  svm.model = "human",
  parallel.cores = 2
)

Arguments

Sequences

Unevaluated sequences. Can be a FASTA file loaded by seqinr-package or secondary structure sequences (Dot-Bracket Notation) obtained from function run_RNAfold. If Sequences is secondary structure sequences file, parameter format should be defined as "SS".

SS.features

Logical. If SS.features = TRUE, secondary structure features will be used.

format

String. Define the format of the Sequences. Can be "DNA" or "SS". "DNA" for DNA sequences and "SS" for secondary structure sequences.

frequencies.file

String or a list obtained from function make_frequencies. Input species name "human", "mouse" or "wheat" to use pre-build frequencies files. Or assign a users' own frequencies file (See function make_frequencies).

svm.model

String or a svm model obtained from function build_model or svm_tune. Input species name "human", "mouse" or "wheat" to use pre-build models. Or assign a users' own model (See function build_model).

parallel.cores

Integer. The number of cores for parallel computation. By default the number of cores is 2. Users can set as -1 to run this function with all cores.

Details

Considering that it is time consuming to obtain secondary structure sequences, users can input nucleotide sequences and predict these sequences without secondary structure features (Set SS.features as FALSE).

Please note that:

SS.features can improve the performance when the species of unevaluated sequences is identical to the species of the sequences that used to build the model.

However, if users are trying to predict sequences with the model trained on other species, SS.features may lead to low accuracy.

For the details of frequencies.file, please refer to function make_frequencies.

For the details of the features, please refer to function extract_features.

Value

Returns a data.frame. Including the results of prediction (Pred); coding potential (Coding.Potential) and the features. For the details of the features, please refer to function extract_features.

References

Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.

Author(s)

HAN Siyu

See Also

build_model, make_frequencies, extract_features, run_RNAfold, read_SS.

Examples

## Not run: 
data(demo_DNA.seq)
Seqs <- demo_DNA.seq

### Input one sequence:
OneSeq <- Seqs[1]
result_1 <- lnc_finder(OneSeq, SS.features = FALSE, format = "DNA",
                       frequencies.file = "human", svm.model = "human",
                       parallel.cores = 2)

### Or several sequences:
data(demo_SS.seq)
Seqs <- demo_SS.seq
result_2 <- lnc_finder(Seqs, SS.features = TRUE, format = "SS",
                       frequencies.file = "mouse", svm.model = "mouse",
                       parallel.cores = 2)

### A complete work flow:
### Calculate second structure on Windows OS,
RNAfold.path <- '"E:/Program Files/ViennaRNA/RNAfold.exe"'
SS.seq <- run_RNAfold(Seqs, RNAfold.path = RNAfold.path, parallel.cores = 2)

### Predict the sequences with secondary structure features,
result_2 <- lnc_finder(SS.seq, SS.features = TRUE, format = "SS",
                       frequencies.file = "mouse", svm.model = "mouse",
                       parallel.cores = 2)

### Predict sequences with your own model by assigning a new svm.model and
### frequencies.file to parameters "svm.model" and "frequencies.file"

## End(Not run)

[Package LncFinder version 1.1.5 Index]