subgroup {TSDT} | R Documentation |
subgroup
Description
Subset a user-provided data.frame according to the subgroup specified by a node in a tree.
Usage
subgroup(splits, node, xdata, ydata = xdata)
Arguments
splits |
A data.frame of splits returned from a call to parse_rpart(). |
node |
The NodeID of the node defining the desired split. |
xdata |
The data.frame of covariates to subset according to the subgroup definition. |
ydata |
The associated vector of response values to subset according to the subgroup definition. (optional) |
Details
After the splits from an rpart.object are extracted by a call to parse_rpart(), the extracted splits define a subgroup for each node. This subgroup can be used to subset a user-provided data.frame. This function takes as its input a data.frame of splits obtained from a call to parse_rpart(), a NodeID indicating which node specifies the desired subgroup, a data.frame of covariates to subset, and (optionally) the associated response data to subset. If only xdata is specified by the user, the subset of xdata implied by the subgroup will be returned. If xdata and ydata are provided by the user, the subset of ydata will be returned (xdata is still required from the user because the subsetting is computed on the covariate values even when the data returned to the user are from ydata).
Value
A data.frame containing the data consistent with the specified subgroup.
See Also
parse_rpart, rpart, rpart.object
Examples
requireNamespace( "rpart", quietly = TRUE )
## Generate example data containing response, treatment, and covariates
N <- 20
continuous_response = runif( min = 0, max = 20, n = N )
trt <- sample( c('Control','Experimental'), size = N, prob = c(0.4,0.6), replace = TRUE )
X1 <- runif( N, min = 0, max = 1 )
X2 <- runif( N, min = 0, max = 1 )
X3 <- sample( c(0,1), size = N, prob = c(0.2,0.8), replace = TRUE )
X4 <- sample( c('A','B','C'), size = N, prob = c(0.6,0.3,0.1), replace = TRUE )
covariates <- data.frame( trt )
names( covariates ) <- "trt"
covariates$X1 <- X1
covariates$X2 <- X2
covariates$X3 <- X3
covariates$X4 <- X4
## Fit an rpart model
fit <- rpart::rpart( continuous_response ~ trt + X1 + X2 + X3 + X4 )
## Return parsed splits with subgroups
splits1 <- parse_rpart( fit, include_subgroups = TRUE )
splits1
## Subset covariate data according to split for NodeID 3
ex1 <- subgroup( splits = splits1, node = 3, xdata = covariates )
ex1
## Subset response data according to split for NodeID 3
ex2 <- subgroup( splits = splits1, node = 3, xdata = covariates, ydata = continuous_response )
ex2