R: Compute the grouped importance of variables (gVIMP) statistic

compute_gVIMP {DynForest}

R Documentation

Compute the grouped importance of variables (gVIMP) statistic

Description

Compute the grouped importance of variables (gVIMP) statistic

Usage

compute_gVIMP(
  DynForest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  group = NULL,
  ncores = NULL,
  seed = 1234
)

Arguments

`DynForest_obj`	`DynForest` object containing the dynamic random forest used on train data
`IBS.min`	(Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0.
`IBS.max`	(Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found.
`group`	A list of groups with the name of the predictors assigned in each group
`ncores`	Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.
`seed`	Seed to replicate results

Value

compute_gVIMP() function returns a list with the following elements:

`Inputs`	A list of 3 elements: `Longitudinal`, `Numeric` and `Factor`. Each element contains the names of the predictors

`group`	A list of each group defined in `group` argument

`gVIMP`	A numeric vector containing the gVIMP for each group defined in `group` argument

`tree_oob_err`	A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic

`IBS.range`	A vector containing the IBS min and max

Examples


data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run DynForest function
res_dyn <- DynForest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gVIMP(DynForest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2, seed = 1234)

[Package DynForest version 1.1.3 Index]