tpr.dag {HEMDAG}R Documentation

TPR-DAG ensemble variants

Description

Collection of the true-path-rule-based hierarchical learning ensemble algorithms and its variants.

TPR-DAG is a family of algorithms on the basis of the choice of the bottom-up step adopted for the selection of positive children (or descendants) and of the top-down step adopted to assure ontology-based predictions. Indeed, in their more general form the TPR-DAG algorithms adopt a two step learning strategy:

  1. in the first step they compute a per-level bottom-up visit from leaves to root to propagate positive predictions across the hierarchy;

  2. in the second step they compute a per-level top-down visit from root to leaves in order to assure the consistency of the predictions.

It is worth noting that levels (both in the first and second step) are defined in terms of the maximum distance from the root node (see graph.levels).

Usage

tpr.dag(
  S,
  g,
  root = "00",
  positive = "children",
  bottomup = "threshold.free",
  topdown = "gpav",
  t = 0,
  w = 0,
  W = NULL,
  parallel = FALSE,
  ncores = 1
)

Arguments

S

a named flat scores matrix with examples on rows and classes on columns.

g

a graph of class graphNEL. It represents the hierarchy of the classes.

root

name of the class that it is on the top-level of the hierarchy (def. root="00").

positive

choice of the positive nodes to be considered in the bottom-up strategy. Can be one of the following values:

  • children (def.): positive children are are considered for each node;

  • descendants: positive descendants are are considered for each node;

bottomup

strategy to enhance the flat predictions by propagating the positive predictions from leaves to root. It can be one of the following values:

  • threshold.free (def.): positive nodes are selected on the basis of the threshold.free strategy;

  • threshold: positive nodes are selected on the basis of the threshold strategy;

  • weighted.threshold.free: positive nodes are selected on the basis of the weighted.threshold.free strategy;

  • weighted.threshold: positive nodes are selected on the basis of the weighted.threshold strategy;

  • tau: positive nodes are selected on the basis of the tau strategy. NOTE: tau is only a DESCENS variant. If you select tau strategy you must set positive=descendants;

topdown

strategy to make scores “hierarchy-aware”. It can be one of the following values:

  • htd: HTD-DAG strategy is applied (htd);

  • gpav (def.): GPAV strategy is applied (gpav);

t

threshold for the choice of positive nodes (def. t=0). Set t only for the variants requiring a threshold for the selection of the positive nodes, otherwise set t=0.

w

weight to balance between the contribution of the node i and that of its positive nodes. Set w only for the weighted variants, otherwise set w=0.

W

vector of weight relative to a single example. If W=NULL (def.) it is assumed that W is a unitary vector of the same length of the columns' number of the matrix S (root node included). Set W only if topdown=gpav.

parallel

a boolean value:

Use parallel only if topdown=GPAV; otherwise set parallel=FALSE.

ncores

number of cores to use for parallel execution. Set ncores=1 if parallel=FALSE, otherwise set ncores to the desired number of cores. Set ncores if and only if topdown=GPAV; otherwise set ncores=1.

Details

The vanilla TPR-DAG adopts a per-level bottom-up traversal of the DAG to correct the flat predictions \hat{y}_i according to the following formula:

\bar{y}_i := \frac{1}{1 + |\phi_i|} (\hat{y}_i + \sum_{j \in \phi_i} \bar{y}_j)

where \phi_i are the positive children of i. Different strategies to select the positive children \phi_i can be applied:

  1. threshold-free strategy: the positive nodes are those children that can increment the score of the node i, that is those nodes that achieve a score higher than that of their parents:

    \phi_i := \{ j \in child(i) | \bar{y}_j > \hat{y}_i \}

  2. threshold strategy: the positive children are selected on the basis of a threshold that can be selected in two different ways:

    1. for each node a constant threshold \bar{t} is a priori selected:

      \phi_i := \{ j \in child(i) | \bar{y}_j > \bar{t} \}

      For instance if the predictions represent probabilities it could be meaningful to a priori select \bar{t}=0.5.

    2. the threshold is selected to maximize some performance metric \mathcal{M} estimated on the training data, as for instance the Fmax or the AUPRC. In other words the threshold is selected to maximize some measure of accuracy of the predictions \mathcal{M}(j,t) on the training data for the class j with respect to the threshold t. The corresponding set of positives \forall i \in V is:

      \phi_i := \{ j \in child(i) | \bar{y}_j > t_j^*, t_j^* = \arg \max_{t} \mathcal{M}(j,t) \}

      For instance t_j^* can be selected from a set of t \in (0,1) through internal cross-validation techniques.

The weighted TPR-DAG version can be designed by adding a weight w \in [0,1] to balance between the contribution of the node i and that of its positive children \phi, through their convex combination:

\bar{y}_i := w \hat{y}_i + \frac{(1 - w)}{|\phi_i|} \sum_{j \in \phi_i} \bar{y}_j

If w=1 no weight is attributed to the children and the TPR-DAG reduces to the HTD-DAG algorithm, since in this way only the prediction for node i is used in the bottom-up step of the algorithm. If w=0 only the predictors associated to the children nodes vote to predict node i. In the intermediate cases we attribute more importance to the predictor for the node i or to its children depending on the values of w. By combining the weighted and the threshold variant, we design the weighted-threshold variant.

Since the contribution of the descendants of a given node decays exponentially with their distance from the node itself, to enhance the contribution of the most specific nodes to the overall decision of the ensemble we design the ensemble variant DESCENS. The novelty of DESCENS consists in strongly considering the contribution of all the descendants of each node instead of only that of its children. Therefore DESCENS predictions are more influenced by the information embedded in the leaves nodes, that are the classes containing the most informative and meaningful information from a biological and medical standpoint. For the choice of the “positive” descendants we use the same strategies adopted for the selection of the “positive” children shown above. Furthermore, we designed a variant specific only for DESCENS, that we named DESCENS-\tau. The DESCENS-\tau variant balances the contribution between the “positives” children of a node i and that of its “positives” descendants excluding its children by adding a weight \tau \in [0,1]:

\bar{y}_i := \frac{\tau}{1+|\phi_i|}(\hat{y}_i + \sum_{j \in \phi_i} \bar{y}_j) + \frac{1-\tau}{1+|\delta_i|}(\hat{y}_i + \sum_{j\in \delta_i} \bar{y}_j)

where \phi_i are the “positive” children of i and \delta_i=\Delta_i \setminus \phi_i the descendants of i without its children. If \tau=1 we consider only the contribution of the “positive” children of i; if \tau=0 only the descendants that are not children contribute to the score, while for intermediate values of \tau we can balance the contribution of \phi_i and \delta_i positive nodes.

Simply by replacing the HTD-DAG top-down step (htd) with the GPAV approach (gpav) we design the ISO-TPR variant. The most important feature of ISO-TPR is that it maintains the hierarchical constraints by construction and it selects the closest solution (in the least square sense) to the bottom-up predictions that obeys the True Path Rule.

Value

A named matrix with the scores of the classes corrected according to the chosen TPR-DAG ensemble algorithm.

See Also

gpav, htd

Examples

data(graph);
data(scores);
data(labels);
root <- root.node(g);
S.tpr <- tpr.dag(S, g, root, positive="children", bottomup="threshold.free",
topdown="gpav", t=0, w=0, W=NULL, parallel=FALSE, ncores=1);

[Package HEMDAG version 2.7.4 Index]