var_selection_by_permute {bartMachine} | R Documentation |

Performs variable selection using the three thresholding methods introduced in Bleich et al. (2013).

var_selection_by_permute(bart_machine, num_reps_for_avg = 10, num_permute_samples = 100, num_trees_for_permute = 20, alpha = 0.05, plot = TRUE, num_var_plot = Inf, bottom_margin = 10)

`bart_machine` |
An object of class “bartMachine”. |

`num_reps_for_avg` |
Number of replicates to over over to for the BART model's variable inclusion proportions. |

`num_permute_samples` |
Number of permutations of the response to be made to generate the “null” permutation distribution. |

`num_trees_for_permute` |
Number of trees to use in the variable selection procedure. As with |

`alpha` |
Cut-off level for the thresholds. |

`plot` |
If TRUE, a plot showing which variables are selected by each of the procedures is generated. |

`num_var_plot` |
Number of variables (in order of decreasing variable inclusion proportion) to be plotted. |

`bottom_margin` |
A display parameter that adjusts the bottom margin of the graph if labels are clipped. The scale of this parameter is the same as set with |

See Bleich et al. (2013) for a complete description of the procedures outlined above as well as the corresponding vignette for a brief summary with examples.

Invisibly, returns a list with the following components:

`important_vars_local_names` |
Names of the variables chosen by the Local procedure. |

`important_vars_global_max_names` |
Names of the variables chosen by the Global Max procedure. |

`important_vars_global_se_names` |
Names of the variables chosen by the Global SE procedure. |

`important_vars_local_col_nums` |
Column numbers of the variables chosen by the Local procedure. |

`important_vars_global_max_col_nums` |
Column numbers of the variables chosen by the Global Max procedure. |

`important_vars_global_se_col_nums` |
Column numbers of the variables chosen by the Global SE procedure. |

`var_true_props_avg` |
The variable inclusion proportions for the actual data. |

`permute_mat` |
The permutation distribution generated by permuting the response vector. |

Although the reference only explores regression settings, this procedure is applicable to both regression and classification problems.
This function is parallelized by the number of cores set in `set_bart_machine_num_cores`

.

Adam Kapelner and Justin Bleich

J Bleich, A Kapelner, ST Jensen, and EI George. Variable Selection Inference for Bayesian Additive Regression Trees. ArXiv e-prints, 2013.

Adam Kapelner, Justin Bleich (2016). bartMachine: Machine Learning with Bayesian Additive Regression Trees. Journal of Statistical Software, 70(4), 1-40. doi:10.18637/jss.v070.i04

`var_selection_by_permute`

, `investigate_var_importance`

## Not run: #generate Friedman data set.seed(11) n = 300 p = 20 ##15 useless predictors X = data.frame(matrix(runif(n * p), ncol = p)) y = 10 * sin(pi* X[ ,1] * X[,2]) +20 * (X[,3] -.5)^2 + 10 * X[ ,4] + 5 * X[,5] + rnorm(n) ##build BART regression model (not actuall used in variable selection) bart_machine = bartMachine(X, y) #variable selection var_sel = var_selection_by_permute(bart_machine) print(var_sel$important_vars_local_names) print(var_sel$important_vars_global_max_names) ## End(Not run)

[Package *bartMachine* version 1.2.6 Index]