het_vimp {cjbart} | R Documentation |
Estimate Variable Importance Metrics for cjbart
Object
Description
Estimates random forest variable importance scores for multiple attribute-levels of a conjoint experiment.
Usage
het_vimp(imces, levels = NULL, covars = NULL, cores = 1, ...)
Arguments
imces |
Object of class |
levels |
An optional vector of attribute-levels to generate importance metrics for. By default, all attribute-levels are analyzed. |
covars |
An optional vector of covariates to include in the importance metric check. By default, all covariates are included in each importance model. |
cores |
Number of CPU cores used during VIMP estimation. Each extra core will result in greater memory consumption. Assigning more cores than outcomes will not further boost performance. |
... |
Extra arguments (used to check for deprecated argument names) |
Details
Having generated a schedule of individual-level marginal component effect estimates, this function fits a random forest model for each attribute-level using the supplied covariates as predictors. It then calculates a variable importance measure (VIMP) for each covariate. The VIMP method assesses how important each covariate is in terms of partitioning the predicted individual-level effects distribution, and can thus be used as an indicator of which variables drive heterogeneity in the IMCEs.
To recover a VIMP measure, we used permutation-based importance metrics recovered from random forest models estimated using randomForestSRC::rfsrc()
. To permute the data, this function uses random node assignment, whereby cases are randomly assigned to a daughter node whenever a tree splits on the target variable (see Ishwaran et al. 2008). Importance is defined in terms of how random node assignment degrades the performance of the forest. Higher degradation indicates a variable is more important to prediction.
Variance estimates of each variable's importance are subsequently recovered using the delete-d jackknife estimator developed by Ishwaran and Lu (2019). The jackknife method has inherent bias correction properties, making it particularly effective for variable selection exercises such as identifying drivers of heterogeneity.
Value
A "long" data.frame of variable importance scores for each combination of covariates and attribute-levels, as well as the estimated 95% confidence intervals for each metric.
References
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008).
“Random survival forests.”
The annals of applied statistics, 2(3), 841–860.
Ishwaran H, Lu M (2019).
“Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.”
Statistics in medicine, 38(4), 558–582.
See Also
randomForestSRC::rfsrc()
and randomForestSRC::subsample()