funkyModel {funkycells} | R Documentation |
Fit a Modified Random Forest Model with Bounds and Alignment
Description
The function fits a modified random forest model to principal components of spatial interactions as well as meta-data. Additionally permutation and cross-validation is employed to improve understanding of the data.
Usage
funkyModel(
data,
K = 10,
outcome = colnames(data)[1],
unit = colnames(data)[2],
metaNames = NULL,
synthetics = 100,
alpha = 0.05,
silent = FALSE,
rGuessSims = 500,
subsetPlotSize = 25,
nTrees = 500,
method = "class"
)
Arguments
data |
Data.frame of outcome and predictors. The predictors include groups of variables which are finite projections of a higher dimensional variables as well as single meta-variables. Any replicate data, i.e. repeated observations, should already be handled. The unit column is needed just to drop data (so pre-removing and giving NULL works). Typically use the results from getKsPCAData, potentially with meta-variables attached. |
K |
(Optional) Numeric indicating the number of folds to use in K-fold cross-validation. The default is 10. |
outcome |
(Optional) String indicating the outcome column name in data. Default is the first column of data. |
unit |
(Optional) String indicating the unit column name in data. Default is the second column of data. |
metaNames |
(Optional) Vector indicating the meta-variables to be considered. Default is NULL. |
synthetics |
(Optional) Numeric indicating the number of synthetics for variables (one set of sythethics for functional variables and one for each meta-variable). If 0 are used, the data cannot be aligned properly. Default is 100. |
alpha |
(Optional) Numeric in (0,1) indicating the significance used throughout the analysis. Default is 0.05. |
silent |
(Optional) Boolean indicating if output should be suppressed when the function is running. Default is FALSE. |
rGuessSims |
(Optional) Numeric value indicating the number of simulations used for guessing and creating the guess estimate on the plot. Default is 500. |
subsetPlotSize |
(Optional) Numeric indicating the number of top variables to include in a subset graph. If this is larger than the total number then no subset graph will be produced. Default is 25. |
nTrees |
(Optional) Numeric indicating the number of trees to use in the random forest model. Default is 500. |
method |
(Optional) Method for rpart tree to build random forest. Default is "class". Currently this is the only tested method. This will be expanded in future releases. |
Value
List with the following items:
model: The funkyForest Model fit on the entire given data.
VariableImportance: Data.frame with the results of variable importance indices from the models and CV. The columns are var, est, sd, and cvSD.
AccuracyEstimate: Data.frame with model accuracy estimates: out-of-bag accuracy (OOB), biased estimate (bias), and random guess (guess). The columns are OOB, bias, and guess.
NoiseCutoff: Numeric indicating noise cutoff (vertical line).
InterpolationCutoff: Vector of numerics indicating the interpolation cutoff (curved line).
AdditionalParams: List of additional parameters for reference: Alpha and subsetPlotSize.
viPlot: ggplot2 object for vi plot with standardized results. It displays ordered underlying functions and meta-variables with point estimates, sd, noise cutoff, and interpolation cutoff all based on variable importance values.
subset_viPlot: (Optional) ggplot2 object for vi plot with standardized results and only top subsetPlotSize variables. It displays ordered underlying functions and meta-variables with point estimates, sd, noise cutoff, and interpolation cutoff all based on variable importance values.
Examples
# Parameters are reduced beyond recommended levels for speed
fm <- funkyModel(
data = TNBC[, c(1:8, ncol(TNBC))],
outcome = "Class", unit = "Person",
metaNames = c("Age"),
nTrees = 5, synthetics = 10,
silent = TRUE
)