| rpartD {Qindex} | R Documentation |
Dichotomize via Recursive Partitioning
Description
Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.
Usage
rpartD(
y,
x,
check_degeneracy = TRUE,
cp = .Machine$double.eps,
maxdepth = 2L,
...
)
m_rpartD(y, X, check_degeneracy = TRUE, ...)
Arguments
y |
a Surv object,
a logical vector,
or a double vector, the response |
x |
|
check_degeneracy |
logical scalar, whether to allow the
dichotomized value to be all- |
cp |
double scalar, complexity parameter, see rpart.control.
Default |
maxdepth |
positive integer scalar, maximum depth of any node, see rpart.control.
Default |
... |
additional parameters of rpart and/or rpart.control |
X |
numeric matrix,
a set of predictors.
Each column of |
Details
Dichotomize Single Predictor
Function rpartD() dichotomizes one predictor in the following steps,
-
Recursive partitioning and regression tree rpart analysis is performed for the response
yand the predictorx. -
The labels.rpart of the first node of the rpart tree is considered as the dichotomizing rule of the double predictor
x. The term dichotomizing rule indicates the combination of an inequality sign (>, >=, < and <=) and a double cutoff thresholda -
The dichotomizing rule from Step 2 is further processed, such that
-
<ais regarded as\geq a -
\leq ais regarded as>a -
> aand\geq aare regarded as is.
This step is necessary for a narrative of greater than or greater than or equal to the threshold
a. -
-
A warning message is produced, if the dichotomizing rule, applied to a new double predictor
newx, creates an all-TRUEor all-FALSEresult. We do not make the algorithm stop, as most regression models in R are capable of handling an all-TRUEor all-FALSEpredictor, by returning aNA_real_regression coefficient estimate.
Dichotomize Multiple Predictors
Function m_rpartD() dichotomizes
each predictor X[,i] based on the response y
using function rpartD().
Applying the multiple dichotomizing rules to a new set of predictors newX,
-
A warning message is produced, if at least one of the dichotomized predictors is all-
TRUEor all-FALSE. -
We do not check if more than one of the dichotomized predictors are identical to each other. We take care of this situation in helper function
coef_dichotom()
Value
Dichotomize Single Predictor
Function rpartD() returns a function,
with a double vector parameter newx.
The returned value of rpartD(y,x)(newx) is a
logical vector
with attributes
attr(,'cutoff')double scalar, the cutoff value for
newx
Dichotomize Multiple Predictors
Function m_rpartD() returns a function,
with a double matrix parameter newX.
The argument for newX must have
the same number of columns and the same column names as
the input matrix X.
The returned value of m_rpartD(y,X)(newX) is a
logical matrix
with attributes
Note
In future integer and factor predictors will be supported.
Examples
## Dichotomize Single Predictor
data(cu.summary, package = 'rpart') # see more details from ?rpart::cu.summary
with(cu.summary, rpartD(y = Price, x = Mileage, check_degeneracy = FALSE))
(foo = with(cu.summary, rpartD(y = Price, x = Mileage)))
foo(rnorm(10, mean = 24.5))
## Dichotomize Multiple Predictors
library(survival)
data(stagec, package = 'rpart') # see more details from ?rpart::stagec
nrow(stagec) # 146
(foo = with(stagec[1:100,], m_rpartD(y = Surv(pgtime, pgstat), X = cbind(age, g2, gleason))))
foo(as.matrix(stagec[-(1:100), c('age', 'g2', 'gleason')]))