splitMedian {ecpc} | R Documentation |
Discretise continuous data in multiple granularities
Description
Discretise continuous co-data by making groups of covariates of various size. The first group is the group with all covariates. Each group is then recursively split in two at the median co-data value, until some user-specified minimum group size is reached. The discretised groups are used for adaptive discretisation of continuous co-data.
Usage
splitMedian(values, index=NULL, depth=NULL, minGroupSize = 50, first = TRUE,
split = c("both","lower","higher"))
Arguments
values |
Vector with the continuous co-data values to be discretised. |
index |
Index of the covariates corresponding to the values supplied. Useful if part of the continuous co-data is missing and only the non-missing part should be discretised. |
depth |
(optional): if given, a discretisation is returned with 'depth' levels of granularity. |
minGroupSize |
Minimum group size that each group of covariates should have. |
split |
"both", "lower" or "higher": should both split groups of covariates be further split, or only the group of covariates that corresponds to the lower or higher continuous co-data group? |
first |
Do not change, recursion help variable. |
Value
A list with groups of covariates, which may be used as group set in ecpc.
See Also
Use obtainHierarchy
to obtain a group set on group level defining the hierarchy for adaptive discretisation of continuous co-data.
Examples
cont.codata <- seq(0,1,length.out=20) #continuous co-data
#full tree with minimum group size 5
groupset1 <- splitMedian(values=cont.codata,minGroupSize=5)
#only split at lower continous co-data group
groupset2 <- splitMedian(values=cont.codata,split="lower",minGroupSize=5)
part <- sample(1:length(cont.codata),15) #discretise only for a part of the continuous co-data
cont.codata[-part] <- NaN #suppose rest is missing
#make group set of non-missing values
groupset3 <- splitMedian(values=cont.codata[part],index=part,minGroupSize=5)
groupset3 <- c(groupset3,list(which(is.nan(cont.codata)))) #add missing data group