automatic_analysis {localScore} | R Documentation |
Automatic analysis
Description
Calculates local score and p-value for sequence(s) with integer scores.
Usage
automatic_analysis(
sequences,
model,
scores,
transition_matrix,
distribution,
method_limit = 2000,
score_extremes,
modelFunc,
simulated_sequence_length = 1000,
...
)
Arguments
sequences |
sequences to be analysed (named list) |
model |
the underlying model of the sequence (either "iid" for identically independently distributed variable or "markov" for Markov chains) |
scores |
vector of minimum and maximum score range |
transition_matrix |
if the sequences are markov chains, this is their transition matrix |
distribution |
vector of probabilities in ascending score order (iid sequences). Note that names of the vector must be the associated scores. |
method_limit |
limit length from which on computation-intensive exact calculation methods for p-value are replaced by approximative methods |
score_extremes |
a vector with two elements: minimal score value, maximal score value |
modelFunc |
function to create similar sequences. In this case, Monte Carlo is used to calculate p-value |
simulated_sequence_length |
if a modelFunc is provided and the sequence happens to be longer than method_limit, the method karlinMonteCarlo is used. This method requires the length of the sequences that will be created by the modelFunc for estimation of Gumble parameters. |
... |
parameters for modelFunc |
Details
This method picks the adequate p-value method for your input.
If no sequences are passed to this function, it will let you pick a FASTA file.
If this is the case, and if you haven't provided any score system
(as you can do by passing a named list with the appropriate scores for each character),
the second file dialog which will pop up is for choosing a file containing the score
(and if you provide an extra column for the probabilities, they will be used, too - see
section File Formats in the vignette for details).
The function then either uses empirical distribution based on your input - or if you provided
a distribution, then yours - to calculate the p-value based on the length of each of the sequences
given as input.
You can influence the choice of the method by providing the modelFunc argument. In this case, the
function uses exclusively simulation methods (monteCarlo, karlinMonteCarlo).
By setting the method_limit you can further decide to which extent computation-intensive methods (daudin, exact_mc)
should be used to calculate the p-value.
Remark that the warnings of the localScoreC() function have be deleted when called by automatic_analysis() function
Value
A list object containing
Local score |
local score... |
p-value |
p-value ... |
Method |
the method used for the calculus of the p-value |
Examples
# Minimal example
l = list()
seq1 = sample(-2:1, size = 3000, replace = TRUE)
seq2 = sample(-3:1, size = 150, replace = TRUE)
l[["hello"]] = seq1
l[["world"]] = seq2
automatic_analysis(l, "iid")
# Example with a given distribution
automatic_analysis(l,"iid",scores=c(-3,1),distribution=c(0.3,0.3,0.1,0.1,0.2))
# forcing the exact method for the longest sequence
aa1=automatic_analysis(l,"iid")
aa1$hello$`method applied`
aa1$hello$`p-value`
aa2=automatic_analysis(l,"iid",method_limit=3000)
aa2$hello$`method applied`
aa2$hello$`p-value`
# Markovian example
MyTransMat <-
matrix(c(0.3,0.1,0.1,0.1,0.4, 0.3,0.2,0.2,0.2,0.1, 0.3,0.4,0.1,0.1,0.1, 0.3,0.3,0.3,0.0,0.1,
0.1,0.3,0.2,0.3,0.1), ncol = 5, byrow=TRUE)
MySeq.CM=transmatrix2sequence(matrix = MyTransMat,length=150, score =-2:2)
MySeq.CM2=transmatrix2sequence(matrix = MyTransMat,length=110, score =-2:2)
automatic_analysis(sequences = list("x1" = MySeq.CM, "x2" = MySeq.CM2), model = "markov")