R: TPR-DAG holdout experiments

tpr.dag.holdout {HEMDAG}

R Documentation

TPR-DAG holdout experiments

Description

Correct the computed scores in a hierarchy according to the selected TPR-DAG ensemble variant by applying a classical holdout procedure.

Usage

tpr.dag.holdout(
  S,
  g,
  ann,
  testIndex,
  norm = FALSE,
  norm.type = NULL,
  W = NULL,
  parallel = FALSE,
  ncores = 1,
  positive = "children",
  bottomup = "threshold",
  topdown = "htd",
  threshold = seq(from = 0.1, to = 0.9, by = 0.1),
  weight = seq(from = 0.1, to = 0.9, by = 0.1),
  kk = 5,
  seed = 23,
  metric = "auprc",
  n.round = NULL
)

Arguments

`S`	a named flat scores matrix with examples on rows and classes on columns.
`g`	a graph of class `graphNEL`. It represents the hierarchy of the classes.
`ann`	an annotation matrix: rows correspond to examples and columns to classes. `ann[i,j]=1` if example `i` belongs to class `j`, `ann[i,j]=0` otherwise. `ann` matrix is necessary to maximize the hyper-parameter(s) of the chosen parametric `TPR-DAG` ensemble variant respect to the metric selected in `metric`. For the parametric-free ensemble variant set `ann=NULL`.
`testIndex`	a vector of integer numbers corresponding to the indexes of the elements (rows) of the scores matrix `S` to be used in the test set.
`norm`	a boolean value. Should the flat score matrix be normalized? By default `norm=FALSE`. If `norm=TRUE` the matrix `S` is normalized according to the normalization type selected in `norm.type`.
`norm.type`	a string character. It can be one of the following values: `NULL` (def.): none normalization is applied (`norm=FALSE`) `maxnorm`: each score is divided for the maximum value of each class; `qnorm`: quantile normalization. preprocessCore package is used;
`W`	vector of weight relative to a single example. If `W=NULL` (def.) it is assumed that `W` is a unitary vector of the same length of the columns' number of the matrix `S` (root node included). Set `W` only if `topdown=gpav`.
`parallel`	a boolean value: `TRUE`: execute the parallel implementation of GPAV (`gpav.parallel`); `FALSE` (def.): execute the sequential implementation of GPAV (`gpav.over.examples`); Use `parallel` only if `topdown=gpav`; otherwise set `parallel=FALSE`.
`ncores`	number of cores to use for parallel execution. Set `ncores=1` if `parallel=FALSE`, otherwise set `ncores` to the desired number of cores. Set `ncores` if and only if `topdown=gpav`; otherwise set `ncores=1`.
`positive`	choice of the positive nodes to be considered in the bottom-up strategy. Can be one of the following values: `children` (`def.`): positive children are are considered for each node; `descendants`: positive descendants are are considered for each node;
`bottomup`	strategy to enhance the flat predictions by propagating the positive predictions from leaves to root. It can be one of the following values: `threshold.free`: positive nodes are selected on the basis of the `threshold.free` strategy (`def.`); `threshold` (`def.`): positive nodes are selected on the basis of the `threshold` strategy; `weighted.threshold.free`: positive nodes are selected on the basis of the `weighted.threshold.free` strategy; `weighted.threshold`: positive nodes are selected on the basis of the `weighted.threshold` strategy; `tau`: positive nodes are selected on the basis of the `tau` strategy. NOTE: `tau` is only a `DESCENS` variant. If you select `tau` strategy you must set `positive=descendants`;
`topdown`	strategy to make the scores hierarchy-consistent. It can be one of the following values: `htd`: `HTD-DAG` strategy is applied (`htd`); `gpav` (`def.`): `GPAV` strategy is applied (`gpav`);
`threshold`	range of threshold values to be tested in order to find the best threshold (`def:` `from:0.1`, `to:0.9`, `by:0.1`). The denser the range is, the higher the probability to find the best threshold is, but the execution time will be higher. For the threshold-free variants, set `threshold=0`.
`weight`	range of weight values to be tested in order to find the best weight (`def:` `from:0.1`, `to:0.9`, `by:0.1`). The denser the range is, the higher the probability to find the best threshold is, but the execution time will be higher. For the weight-free variants, set `weight=0`.
`kk`	number of folds of the cross validation (`def: kk=5`) on which tuning the parameters `threshold`, `weight` and `tau` of the parametric ensemble variants. For the parametric-free variants (i.e. if `bottomup = threshold.free`), set `kk=NULL`.
`seed`	initialization seed for the random generator to create folds (`def. 23`). If `seed=NULL` folds are generated without seed initialization. If `bottomup=threshold.free`, set `seed=NULL`.
`metric`	a string character specifying the performance metric on which maximizing the parametric ensemble variant. It can be one of the following values: `auprc` (def.): the parametric ensemble variant is maximized on the basis of AUPRC (`auprc`); `fmax`: the parametric ensemble variant is maximized on the basis of Fmax (`multilabel.F.measure`; `NULL`: `threshold.free` variant is parameter-free, so none optimization is needed.
`n.round`	number of rounding digits (def. `3`) to be applied to the hierarchical scores matrix for choosing the best threshold on the basis of the best Fmax. If `bottomup==threshold.free` or `metric="auprc"`, set `n.round=NULL`.

Details

The parametric hierarchical ensemble variants are cross-validated maximizing the parameter on the metric selected in metric,

Value

A named matrix with the scores of the classes corrected according to the chosen TPR-DAG ensemble algorithm. Rows of the matrix are shrunk to testIndex.

Examples

data(graph);
data(scores);
data(labels);
data(test.index);
S.tpr <- tpr.dag.holdout(S, g, ann=NULL, testIndex=test.index, norm=FALSE, norm.type=NULL,
positive="children", bottomup="threshold.free", topdown="gpav", W=NULL, parallel=FALSE,
ncores=1, threshold=0, weight=0, kk=NULL, seed=NULL, metric=NULL, n.round=NULL);

[Package HEMDAG version 2.7.4 Index]