psica {psica}R Documentation

Create a tree that discovers groups having similar treatment (intervention) effects.

Description

The PSICA method operates by first building regression trees for each treament group and then obtaining the distributions of the effect size for given levels of independent variables by either bootstrap or by means of the bias-corrected infinitesimal jackknife. The obtained distributions are used for computing the probabilities that one treatment is better (effect size is greater) than the other treatments for a given set of input values. These probabilities are then summarised in the form of a decision tree built with a special loss function. The terminal nodes of the resulting tree show the probabilities that one treatment is better than the other treatments as well as a label containing the possible best treatments.

Usage

psica(formula, data, intervention, method = "normal",
  forestControl = list(minsplit = 10, mincriterion = 0.95, nBoots = 500,
  nTrees = 200, mtry = 5), treeControl = rpart::rpart.control(minsplit =
  20, minbucket = 10, cp = 0.003), confidence = 0.95, prune = TRUE,
  ...)

Arguments

formula

Formula that shows the dependent variable (effect) and independent variables (separated by '+'). The treatment variable should not be present among dependent variables

data

Data frame containing dependent and independent variables and the categorical treatment variable

intervention

The name of the treatment variable

method

Choose "boot" for computing probabilities by bootstrapping random forests, "normal" for computing probabilities by appoximating random forest variance with infinitesimal jackknife with bias correction.

forestControl

parameters of forest growing, a list with parameters

  • minsplit: minimum number of observation in the node to be splitted when growing random forest, default 10

  • mincriterion: "mincriterion" setting of the random forest, see ctree in package partykit

  • nBoots: number of trees in random forest.

  • nTrees: amount of trees in each random forest

  • mtry: number of variables to be selected at each split. Choose either 'sqrt(amount_of_inputs)' if amount of input variables is large or 'amount_of_inputs' if there are few input variables.

treeControl

Parameters for decision tree growing, see rpart.control()

confidence

Parameter that defines the cut-off probability in the loss function and also which treatments are included in the labels of the PSICA tree. More specifically, labels in the terminal nodes show all treatments except of useless treatments, i.e. the treatments that altogether have a probability to be the best which is smaller than 1-confidence.

prune

should the final tree be pruned or is (possibly) overfitted tree desired?

...

further argumets passed to rpart object.

Value

Object of a class psicaTree

References

Sysoev O, Bartoszek K, Ekström E, Ekholm Selling K (2019). “PSICA: Decision trees for probabilistic subgroup identification with categorical treatments.” Statistics in Medicine, 38(22), 4436-4452. doi: 10.1002/sim.8308, https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.8308, https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8308.

Examples

n=100
X1=runif(n)
X1=sort(X1)
f1<- function(x){
  2*tanh(4*x-2)+3
}
X2=runif(n)
X2=sort(X2)
f2<- function(x){
  2*tanh(2*x-1)+2.3 #2.8
}
plot(X1,f1(X1),ylim=c(0,5), type="l")
points(X2,f2(X2), type="l")
Y1=f1(X1)+rnorm(n, 0, 0.8)
Y2=f2(X2)+rnorm(n,0,0.8)
points(X1,Y1, col="blue")
points(X2,Y2, col="red")
data=data.frame(X=c(X1,X2), Y=c(Y1,Y2), interv=c(rep("treat",n), rep("control",n)))
pt=psica(Y~X, data=data, method="normal",intervention = "interv",
 forestControl=list(nBoots=200, mtry=1))
print(pt)
plot(pt)


[Package psica version 1.0.2 Index]