tpr.dag.cv {HEMDAG} | R Documentation |
TPR-DAG cross-validation experiments
Description
Correct the computed scores in a hierarchy according to the a TPR-DAG
ensemble variant.
Usage
tpr.dag.cv(
S,
g,
ann,
norm = FALSE,
norm.type = NULL,
positive = "children",
bottomup = "threshold",
topdown = "gpav",
W = NULL,
parallel = FALSE,
ncores = 1,
threshold = seq(from = 0.1, to = 0.9, by = 0.1),
weight = 0,
kk = 5,
seed = 23,
metric = "auprc",
n.round = NULL
)
Arguments
S |
a named flat scores matrix with examples on rows and classes on columns.
|
g |
a graph of class graphNEL . It represents the hierarchy of the classes.
|
ann |
an annotation matrix: rows correspond to examples and columns to classes. ann[i,j]=1 if example i belongs to
class j , ann[i,j]=0 otherwise. ann matrix is necessary to maximize the hyper-parameter(s) of the chosen parametric
TPR-DAG ensemble variant respect to the metric selected in metric . For the parametric-free ensemble variant set ann=NULL .
|
norm |
a boolean value. Should the flat score matrix be normalized? By default norm=FALSE . If norm=TRUE the matrix S
is normalized according to the normalization type selected in norm.type .
|
norm.type |
a string character. It can be one of the following values:
-
NULL (def.): none normalization is applied (norm=FALSE )
-
maxnorm : each score is divided for the maximum value of each class (scores.normalization );
-
qnorm : quantile normalization. preprocessCore package is used (scores.normalization );
|
positive |
choice of the positive nodes to be considered in the bottom-up strategy. Can be one of the following values:
|
bottomup |
strategy to enhance the flat predictions by propagating the positive predictions from leaves to root. It can be one of the following values:
-
threshold.free : positive nodes are selected on the basis of the threshold.free strategy;
-
threshold (def. ): positive nodes are selected on the basis of the threshold strategy;
-
weighted.threshold.free : positive nodes are selected on the basis of the weighted.threshold.free strategy;
-
weighted.threshold : positive nodes are selected on the basis of the weighted.threshold strategy;
-
tau : positive nodes are selected on the basis of the tau strategy.
NOTE: tau is only a DESCENS variant. If you select tau strategy you must set positive=descendants ;
|
topdown |
strategy to make the scores hierarchy-consistent. It can be one of the following values:
|
W |
vector of weight relative to a single example. If W=NULL (def.) it is assumed that W is a unitary vector of the same length
of the columns' number of the matrix S (root node included). Set W only if topdown=gpav .
|
parallel |
a boolean value:
Use parallel only if topdown=gpav ; otherwise set parallel=FALSE .
|
ncores |
number of cores to use for parallel execution. Set ncores=1 if parallel=FALSE , otherwise set ncores to the
desired number of cores. Set ncores if topdown=gpav , otherwise set ncores=1 .
|
threshold |
range of threshold values to be tested in order to find the best threshold (def: from:0.1 , to:0.9 , by:0.1 ).
The denser the range is, the higher the probability to find the best threshold is, but the execution time will be higher.
For the threshold-free variants, set threshold=0 .
|
weight |
range of weight values to be tested in order to find the best weight (def: from:0.1 , to:0.9 , by:0.1 ).
The denser the range is, the higher the probability to find the best threshold is, but the execution time will be higher.
For the weight-free variants, set weight=0 .
|
kk |
number of folds of the cross validation (def: kk=5 ) on which tuning the parameters threshold , weight and tau of
the parametric ensemble variants. For the parametric-free variants (i.e. if bottomup = threshold.free ), set kk=NULL .
|
seed |
initialization seed for the random generator to create folds (def. 23 ). If seed=NULL folds are generated without seed
initialization. If bottomup=threshold.free , set seed=NULL .
|
metric |
a string character specifying the performance metric on which maximizing the parametric ensemble variant. It can be one of the following values:
-
auprc (def.): the parametric ensemble variant is maximized on the basis of AUPRC (auprc );
-
fmax : the parametric ensemble variant is maximized on the basis of Fmax (multilabel.F.measure ;
-
NULL : threshold.free variant is parameter-free, so none optimization is needed.
|
n.round |
number of rounding digits (def. 3 ) to be applied to the hierarchical scores matrix for choosing the best threshold on the basis of
the best Fmax. If bottomup==threshold.free or metric="auprc" , set n.round=NULL .
|
Details
The parametric hierarchical ensemble variants are cross-validated maximizing the parameter on the metric selected in metric
.
Value
A named matrix with the scores of the functional terms corrected according to the chosen TPR-DAG
ensemble algorithm.
Examples
data(graph);
data(scores);
data(labels);
S.tpr <- tpr.dag.cv(S, g, ann=NULL, norm=FALSE, norm.type=NULL, positive="children",
bottomup="threshold.free", topdown="gpav", W=NULL, parallel=FALSE, ncores=1,
threshold=0, weight=0, kk=NULL, seed=NULL, metric=NULL, n.round=NULL);
[Package
HEMDAG version 2.7.4
Index]