R: POsitivity-Regression Tree (PoRT) Algorithm to Identify...

port {RISCA}

R Documentation

POsitivity-Regression Tree (PoRT) Algorithm to Identify Positivity Violations.

Description

This function allows to identify potential posivity violations by using the PoRT algorithm.

Usage

port(group, cov.quanti, cov.quali, data, alpha, beta, gamma, pruning,
  minbucket, minsplit, maxdepth)

Arguments

`group`	A character string with the name of the exposure in `data`: 0 for the untreated/unexposed patients and 1 for the treated/exposed patients.
`cov.quanti`	A character string with the names of the quantitative predictors in `data`.
`cov.quali`	A character string with the names of the qualitative predictors in `data`.
`data`	A data frame in which to look for the variables related to the treatment/exposure and the predictors.
`alpha`	The minimal proportion of the whole sample size to consider a problematic subgroup. The default value is 0.05.
`beta`	The exposed or unexposed proportion under which one can consider a positivity violation. The default value is 0.05.
`gamma`	The maximum number of predictors used to define the subgroup. The default value is 2. See 'Details'.
`pruning`	If `TRUE`, provide only the violations contained between two values for quantitative predictors. The default value is `FALSE`.
`minbucket`	An `rpart` parameter: minimum number of observations in any leaf. The default value is 6.
`minsplit`	An `rpart` parameter: minimum number of observations that must exist in a node in order for a split to be attempted. If only one of `minbucket` or `minsplit` is specified, the code either sets `minsplit` to `minbucket*3` or `minbucket` to `minsplit/3`, as appropriate. The default value is 20.
`maxdepth`	An `rpart` parameter. Set the maximum depth of any node of the final tree, with the root node counted as depth 0. Values greater than 30 rpart will give nonsense results on 32-bit machines. The default value is 30.

Details

In a first step, the PoRT algorithm estimates one tree for each predictor and memorises the leaves corresponding to problematic subgroups according to the hyperparameters alpha and beta (i.e., the subgroup must at least include alpha*100 percent of the whole sample, and the exposure prevalence in the subgroup must be superior to 1-beta or inferior to beta). If gamma=1, the algorithm stops. Otherwise, if at least one problematic subgroup is identified in this first step, the corresponding predictor(s) is(are) not considered in the second step, which estimates one tree for all possible couples of remaining predictors and memorizes the leaves corresponding to problematic subgroups. If gamma=2, the algorithm stops; otherwise, the third step consists of building one tree for all possible trios of remaining covariates not involved in the previously identified subgroups, etc.

Value

The port function returns a characters string summarising all the subgroups identified as violating the positivity assumption, and provides for each of these subgroups the exposure prevalence, the subgroup size and the relative subgroup size (with respect to the sample size).

Author(s)

Arthur Chatton <Arthur.Chatton@univ-nantes.fr>

References

Danelian et al. Identification of positivity violations' using regression trees: The PoRT algorithm. Manuscript submitted. 2022.

Examples

data("dataDIVAT2")

# PoRT with default hyperparameters
port(group="ecd", cov.quanti="age", cov.quali=c("hla", "retransplant"),
data=dataDIVAT2)

# Illustration of the 'pruning' argument
port(group="ecd", cov.quanti="age", cov.quali=c("hla", "retransplant"),
    data=dataDIVAT2, beta=0.01)
    
port(group="ecd", cov.quanti="age", cov.quali=c("hla", "retransplant"),
    data=dataDIVAT2, beta=0.01, pruning=TRUE)

[Package RISCA version 1.0.5 Index]