rpartD {Qindex} | R Documentation |
Dichotomize via Recursive Partitioning
Description
Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.
Usage
rpartD(
y,
x,
check_degeneracy = TRUE,
cp = .Machine$double.eps,
maxdepth = 2L,
...
)
m_rpartD(y, X, check_degeneracy = TRUE, ...)
Arguments
y |
a Surv object,
a logical vector,
or a double vector, the response |
x |
|
check_degeneracy |
logical scalar, whether to allow the
dichotomized value to be all- |
cp |
double scalar, complexity parameter, see rpart.control.
Default |
maxdepth |
positive integer scalar, maximum depth of any node, see rpart.control.
Default |
... |
additional parameters of rpart and/or rpart.control |
X |
numeric matrix,
a set of predictors.
Each column of |
Details
Dichotomize Single Predictor
Function rpartD()
dichotomizes one predictor in the following steps,
-
Recursive partitioning and regression tree rpart analysis is performed for the response
y
and the predictorx
. -
The labels.rpart of the first node of the rpart tree is considered as the dichotomizing rule of the double predictor
x
. The term dichotomizing rule indicates the combination of an inequality sign (>, >=, < and <=) and a double cutoff thresholda
-
The dichotomizing rule from Step 2 is further processed, such that
-
<a
is regarded as\geq a
-
\leq a
is regarded as>a
-
> a
and\geq a
are regarded as is.
This step is necessary for a narrative of greater than or greater than or equal to the threshold
a
. -
-
A warning message is produced, if the dichotomizing rule, applied to a new double predictor
newx
, creates an all-TRUE
or all-FALSE
result. We do not make the algorithm stop, as most regression models in R are capable of handling an all-TRUE
or all-FALSE
predictor, by returning aNA_real_
regression coefficient estimate.
Dichotomize Multiple Predictors
Function m_rpartD()
dichotomizes
each predictor X[,i]
based on the response y
using function rpartD()
.
Applying the multiple dichotomizing rules to a new set of predictors newX
,
-
A warning message is produced, if at least one of the dichotomized predictors is all-
TRUE
or all-FALSE
. -
We do not check if more than one of the dichotomized predictors are identical to each other. We take care of this situation in helper function
coef_dichotom()
Value
Dichotomize Single Predictor
Function rpartD()
returns a function,
with a double vector parameter newx
.
The returned value of rpartD(y,x)(newx)
is a
logical vector
with attributes
attr(,'cutoff')
double scalar, the cutoff value for
newx
Dichotomize Multiple Predictors
Function m_rpartD()
returns a function,
with a double matrix parameter newX
.
The argument for newX
must have
the same number of columns and the same column names as
the input matrix X
.
The returned value of m_rpartD(y,X)(newX)
is a
logical matrix
with attributes
Note
In future integer and factor predictors will be supported.
Examples
## Dichotomize Single Predictor
data(cu.summary, package = 'rpart') # see more details from ?rpart::cu.summary
with(cu.summary, rpartD(y = Price, x = Mileage, check_degeneracy = FALSE))
(foo = with(cu.summary, rpartD(y = Price, x = Mileage)))
foo(rnorm(10, mean = 24.5))
## Dichotomize Multiple Predictors
library(survival)
data(stagec, package = 'rpart') # see more details from ?rpart::stagec
nrow(stagec) # 146
(foo = with(stagec[1:100,], m_rpartD(y = Surv(pgtime, pgstat), X = cbind(age, g2, gleason))))
foo(as.matrix(stagec[-(1:100), c('age', 'g2', 'gleason')]))