vcr.forest.newdata {classmap}R Documentation

Prepare for visualization of a random forest classification on new data.

Description

Produces output for the purpose of constructing graphical displays such as the classmap on new data. Requires the output of vcr.forest.train as an argument.

Usage

vcr.forest.newdata(Xnew, ynew = NULL, vcr.forest.train.out,
                   LOO = FALSE)

Arguments

Xnew

data matrix of the new data, with the same number of columns d as in the training data. Missing values are not allowed.

ynew

factor with class membership of each new case. Can be NA for some or all cases. If NULL, is assumed to be NA everywhere.

vcr.forest.train.out

output of vcr.forest.train on the training data.

LOO

leave one out. Only used when testing this function on a subset of the training data. Default is LOO=FALSE.

Value

A list with components:

yintnew

number of the given class of each case. Can contain NA's.

ynew

given class label of each case. Can contain NA's.

levels

levels of the response, from vcr.forest.train.out.

predint

predicted class number of each case. Always exists.

pred

predicted label of each case.

altint

number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is NA for cases whose ynew is missing.

altlab

alternative label if yintnew was given, else NA.

PAC

probability of the alternative class. Is NA for cases whose ynew is missing.

fig

distance of each case i from each class g. Always exists.

farness

farness of each case from its given class. Is NA for cases whose ynew is missing.

ofarness

for each case i, its lowest fig[i,g] to any class g. Always exists.

Author(s)

Raymaekers J., Rousseeuw P.J.

References

Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)

See Also

vcr.forest.train, classmap, silplot, stackedplot

Examples

library(randomForest)
data("data_instagram")
traindata <- data_instagram[which(data_instagram$dataType == "train"), -13]
set.seed(71) # randomForest is not deterministic
rfout <- randomForest(y ~ ., data = traindata, keep.forest = TRUE)
mytype <- list(symm = c(1, 5, 7, 8)) # These 4 columns are
# (symmetric) binary variables. The variables that are not
# listed are interval-scaled by default.
x_train <- traindata[, -12]
y_train <- traindata[, 12]
vcrtrain <- vcr.forest.train(X = x_train, y = y_train,
                            trainfit = rfout, type = mytype)
testdata <- data_instagram[which(data_instagram$dataType == "test"), -13]
Xnew <- testdata[, -12]
ynew <- testdata[, 12]
vcrtest <- vcr.forest.newdata(Xnew, ynew, vcrtrain)
confmat.vcr(vcrtest)
stackedplot(vcrtest, classCol = c(4, 2))
silplot(vcrtest, classCols = c(4, 2))
classmap(vcrtest, "genuine", classCols = c(4, 2))
classmap(vcrtest, "fake", classCols = c(4, 2))

# For more examples, we refer to the vignette:
## Not run: 
vignette("Random_forest_examples")

## End(Not run)

[Package classmap version 1.2.3 Index]