vcr.forest.train {classmap} | R Documentation |
Prepare for visualization of a random forest classification on training data
Description
Produces output for the purpose of constructing graphical displays such as the classmap
and silplot
. The user first needs to train a random forest on the data by randomForest::randomForest
.
This then serves as an argument to vcr.forest.train
.
Usage
vcr.forest.train(X, y, trainfit, type = list(),
k = 5, stand = TRUE)
Arguments
X |
A rectangular matrix or data frame, where the columns (variables) may be of mixed type. |
y |
factor with the given class labels.
It is crucial that |
trainfit |
the output of a |
k |
the number of nearest neighbors used in the farness computation. |
type |
list for specifying some (or all) of the types of the
variables (columns) in |
stand |
whether or not to standardize numerical (interval scaled) variables by their range as in the original |
Value
A list with components:
X |
The data used to train the forest. |
yint |
number of the given class of each case. Can contain |
y |
given class label of each case. Can contain |
levels |
levels of |
predint |
predicted class number of each case. For each case this is the class with the highest posterior probability. Always exists. |
pred |
predicted label of each case. |
altint |
number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is |
altlab |
label of the alternative class. Is |
PAC |
probability of the alternative class. Is |
figparams |
parameters for computing |
fig |
distance of each case |
farness |
farness of each case from its given class. Is |
ofarness |
for each case |
trainfit |
The trained random forest which was given as an input to this function. |
Author(s)
Raymaekers J., Rousseeuw P.J.
References
Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)
See Also
vcr.forest.newdata
, classmap
, silplot
, stackedplot
Examples
library(randomForest)
data("data_instagram")
traindata <- data_instagram[which(data_instagram$dataType == "train"), -13]
set.seed(71) # randomForest is not deterministic
rfout <- randomForest(y~., data = traindata, keep.forest = TRUE)
mytype <- list(symm = c(1, 5, 7, 8)) # These 4 columns are
# (symmetric) binary variables. The variables that are not
# listed are interval-scaled by default.
x_train <- traindata[, -12]
y_train <- traindata[, 12]
# Prepare for visualization:
vcrtrain <- vcr.forest.train(X = x_train, y = y_train,
trainfit = rfout, type = mytype)
confmat.vcr(vcrtrain)
stackedplot(vcrtrain, classCols = c(4, 2))
silplot(vcrtrain, classCols = c(4, 2))
classmap(vcrtrain, "genuine", classCols = c(4, 2))
classmap(vcrtrain, "fake", classCols = c(4, 2))
# For more examples, we refer to the vignette:
## Not run:
vignette("Random_forest_examples")
## End(Not run)