compute_VIMP {DynForest}R Documentation

Compute the importance of variables (VIMP) statistic

Description

Compute the importance of variables (VIMP) statistic

Usage

compute_VIMP(
  DynForest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  ncores = NULL,
  seed = round(runif(1, 0, 10000))
)

Arguments

DynForest_obj

DynForest object containing the dynamic random forest used on train data

IBS.min

(Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0.

IBS.max

(Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found.

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

seed

Seed to replicate results

Value

compute_VIMP() function returns a list with the following elements:

Inputs A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
Importance A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains a numeric vector of VIMP statistic predictor in Inputs value
tree_oob_err A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic
IBS.range A vector containing the IBS min and max

Author(s)

Anthony Devaux (anthony.devaux@u-bordeaux.fr)

Examples


data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run DynForest function
res_dyn <- DynForest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute VIMP statistic
res_dyn_VIMP <- compute_VIMP(DynForest_obj = res_dyn, ncores = 2)


[Package DynForest version 1.1.0 Index]