compute_gVIMP {DynForest}R Documentation

Compute the grouped importance of variables (gVIMP) statistic

Description

Compute the grouped importance of variables (gVIMP) statistic

Usage

compute_gVIMP(
  DynForest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  group = NULL,
  ncores = NULL,
  seed = round(runif(1, 0, 10000))
)

Arguments

DynForest_obj

DynForest object containing the dynamic random forest used on train data

IBS.min

(Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0.

IBS.max

(Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found.

group

A list of groups with the name of the predictors assigned in each group

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

seed

Seed to replicate results

Value

compute_gVIMP() function returns a list with the following elements:

Inputs A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
gVIMP A numeric vector containing the gVIMP for each group defined in group argument
tree_oob_err A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic
IBS.range A vector containing the IBS min and max

Author(s)

Anthony Devaux (anthony.devaux@u-bordeaux.fr)

Examples


data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run DynForest function
res_dyn <- DynForest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gVIMP(DynForest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2)


[Package DynForest version 1.1.0 Index]