R: Pruning A Precision Medicine Tree

pmprune {dipm}

R Documentation

Pruning A Precision Medicine Tree

Description

This function prunes classification trees designed for the precision medicine setting.

Usage

pmprune(tree)

Arguments

tree

A data frame object returned from either the dipm() or spmtree() function

Details

This function implements the simple pruning strategy proposed and used in Tsai et al. (2016). Terminal sister nodes, i.e., nodes with no child nodes that share the same parent node, are removed if they have the same identified optimal treatment assignment.

Value

pmprune returns the pruned classification tree as a data frame. The data frame contains the following columns of information:

`node`	Unique integer values that identify each node in the tree, where all of the nodes are indexed starting from 1
`splitvar`	Integers that represent the candidate split variable used to split each node, where all of the variables are indexed starting from 1; for terminal nodes, i.e., nodes without child nodes, the value is set equal to NA
`splitvar_name`	The names of the candidate split variables used to split each node obtained from the column names of the supplied data; for terminal nodes, the value is set equal to NA
`type`	Characters that denote the type of each candidate split variable; "bin" is for binary variables, "ord" for ordinal, and "nom" for nominal; for terminal nodes, the value is set equal to NA
`splitval`	Values of the left child node of the current split/node; for binary variables, a value of 0 is printed, and subjects with values of 0 for the current `splitvar` are in the left child node, while subjects with values of 1 are in the right child node; for ordinal variables, `splitval` is numeric and implies that subjects with values of the current `splitvar` less than or equal to `splitval` are in the left child node, while the remaining subjects with values greater than `splitval` are in the right child node; for nominal variables, the `splitval` is a set of integers separated by commas, and subjects in that set of categories are in the left child node, while the remaining subjects are in the right child node; for terminal nodes, the value is set equal to NA
`lchild`	Integers that represent the index (i.e., `node` value) of each node's left child node; for terminal nodes, the value is set equal to NA
`rchild`	Integers that represent the index (i.e., `node` value) of each node's right child node; for terminal nodes, the value is set equal to NA
`depth`	Integers that specify the depth of each node; the root node has depth 1, its children have depth 2, etc.
`nsubj`	Integers that count the total number of subjects within each node
`besttrt`	Integers that denote the identified best treatment assignment of each node

References

Tsai, W.-M., Zhang, H., Buta, E., O'Malley, S., Gueorguieva, R. (2016). A modified classification tree method for personalized medicine decisions. Statistics and its Interface 9, 239-253.

Examples


#
# ... an example with a continuous outcome variable
#     and three treatment groups
#


N = 100
set.seed(123)

# generate treatments
treatment = sample(1:3, N, replace = TRUE)

# generate candidate split variables
X1 = round(rnorm(n = N, mean = 0, sd = 1), 4)
X2 = round(rnorm(n = N, mean = 0, sd = 1), 4)
X3 = sample(1:4, N, replace = TRUE)
X4 = sample(1:5, N, replace = TRUE)
X5 = rbinom(N, 1, 0.5)
X6 = rbinom(N, 1, 0.5)
X7 = rbinom(N, 1, 0.5)
X = cbind(X1, X2, X3, X4, X5, X6, X7)
colnames(X) = paste0("X", 1:7)

# generate continuous outcome variable
calculateLink = function(X, treatment){

    10.2 - 0.3 * (treatment == 1) - 0.1 * X[, 1] + 
    2.1 * (treatment == 1) * X[, 1] +
    1.2 * X[, 2]
}

Link = calculateLink(X, treatment)
Y = rnorm(N, mean = Link, sd = 1)

# combine variables in a data frame
data = data.frame(X, Y, treatment)

# create vector of variable types
types = c(rep("ordinal", 2), rep("nominal", 2), rep("binary", 3),
            "response", "treatment")

# fit a classification tree
tree = spmtree(Y ~ treatment | ., data, types = types, dataframe = TRUE)

# prune the tree
ptree = pmprune(tree)

[Package dipm version 1.9 Index]